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Abstract 



Neural networks can synchronize by learning from each other. For that pur- 
pose they receive common inputs and exchange their outputs. Adjusting discrete 
weights according to a suitable learning rule then leads to full synchronization in 
a finite number of steps. It is also possible to train additional neural networks by 
using the inputs and outputs generated during this process as examples. Several 
algorithms for both tasks are presented and analyzed. 

In the case of Tree Parity Machines the dynamics of both processes is driven 
by attractive and repulsive stochastic forces. Thus it can be described well by 
models based on random walks, which represent either the weights themselves or 
order parameters of their distribution. However, synchronization is much faster 
than learning. This effect is caused by different frequencies of attractive and 
repulsive steps, as only neural networks interacting with each other are able to 
skip unsuitable inputs. Scaling laws for the number of steps needed for full syn- 
chronization and successful learning are derived using analytical models. They 
indicate that the difference between both processes can be controlled by changing 
the synaptic depth. In the case of bidirectional interaction the synchronization 
time increases proportional to the square of this parameter, but it grows expo- 
nentially, if information is transmitted in one direction only. 

Because of this effect neural synchronization can be used to construct a cryp- 
tographic key-exchange protocol. Here the partners benefit from mutual inter- 
action, so that a passive attacker is usually unable to learn the generated key 
in time. The success probabilities of different attack methods are determined by 
numerical simulations and scaling laws are derived from the data. If the synap- 
tic depth is increased, the complexity of a successful attack grows exponentially, 
but there is only a polynomial increase of the effort needed to generate a key. 
Therefore the partners can reach any desired level of security by choosing suit- 
able parameters. In addition, the entropy of the weight distribution is used to 
determine the effective number of keys, which are generated in different runs of 
the key-exchange protocol using the same sequence of input vectors. 

If the common random inputs are replaced with queries, synchronization is 
possible, too. However, the partners have more control over the difficulty of the 
key exchange and the attacks. Therefore they can improve the security without 
increasing the average synchronization time. 
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Zusammenfassung 



Neuronale Netze, die die gleichen Eingaben erhalten und ihre Ausgaben austau- 
schen, konnen voneinander lernen und auf diese Weise synchronisieren. Wenn 
diskrete Gewichte und eine geeignete Lernregel verwendet werden, kommt es in 
endlich vielen Schritten zur voUstandigen Synchronisation. Mit den dabei erzeug- 
ten Beispielen lassen sich weitere neuronale Netze trainieren. Es werden mehrere 
Algorithmen fiir beide Aufgaben vorgestellt und untersucht. 

Attraktive und repulsive Zufallskrafte treiben bei Tree Parity Machines so- 
wohl den Synchronisationsvorgang als auch die Lernprozesse an, so dass sich 
alle Ablaufe gut durch Random- Walk-Modelle beschreiben lassen. Dabei sind 
die Random Walks entweder die Gewichte selbst oder Ordnungsparameter ihrer 
Verteilung. Allerdings sind miteinandcr wcchselwirkende neuronale Netze in der 
Lage, ungeeignete Eingaben zu iiberspringen und so repulsive Schritte teilweise 
zu vermeiden. Deshalb konnen Tree Parity Machines schneller synchronisieren als 
lernen. Aus analytischen Modellen abgeleitete Skalengesetze zeigen, dass der Un- 
terschied zwischen beiden Vorgangen von der synaptischen Tiefe abhangt. Wenn 
die beiden neuronalen Netze sich gegenseitig beeinflussen konnen, steigt die Syn- 
chronisationszeit nur proportional zu diesem Parameter an; sie wachst jedoch 
exponentiell, sobald die Informationen nur in eine Richtung fiiefien. 

Deswegen lasst sich mittels neuronaler Synchronisation ein kryptographisches 
SchliisselaustauschprotokoU reahsieren. Da die Partner sich gegenseitig beeinflus- 
sen, der Angreifer diese Moglichkeit aber nicht hat, gelingt es ihm meistens nicht, 
den erzeugten Schliissel rechtzeitig zu finden. Die Erfolgswahrscheinlichkeiten der 
verschiedenen Angriffe werden mittels numerischer Simulationen bestimmt. Die 
dabei gefundenen Skalengesetze zeigen, dass die Komplexitat eines erfolgreichen 
Angriffs exponentiell mit der synaptischen Tiefe ansteigt, aber der Aufwand fiir 
den Schliisselaustausch selbst nur polynomial anwachst. Somit konnen die Partner 
jedes beliebige Sicherheitsniveau durch geeignete Wahl der Parameter erreichen. 
AuBerdem wird die effektive Zahl der Schliissel berechnet, die das Schliisselaus- 
tauschprotokoU bei vorgegebener Zeitreihe der Eingaben erzeugen kann. 

Der neuronale Schliisselaustausch funktioniert auch dann, wenn die Zufalls- 
eingaben durch Queries ersetzt werden. Jedoch haben die Partner in diesem Fall 
mehr Kontrolle iiber die Komplexitat der Synchronisation und der Angriffe. Des- 
halb gelingt es, die Sicherheit zu verbessern, ohne den Aufwand zu erhohen. 
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Chapter 1 
Introduction 



Synchronization is an interesting phenomenon, which can be observed in a lot of 
physical and also biological systems [l|. It has been first discovered for weakly 
coupled oscillators, which develop a constant phase relation to each other. While 
a lot of systems show this type of synchronization, a periodic time evolution is 
not required. This is clearly visible in the case of chaotic systems. These can be 
synchronized by a common source of noise [i], [sl or by interaction j3, 0] . 

As soon as full synchronization is achieved, one observes two or more systems 
with identical dynamics. But sometimes only parts synchronize. And it is even 
possible that one finds a fixed relation between the states of the systems instead 
of identical dynamics. Thus these phenomena look very different, although they 
are all some kind of synchronization. In most situations it does not matter, if 
the interaction is unidirectional or bidirectional. So there is usually no difference 
between components, which influence each other actively and those which are 
passively influenced by the dynamics of other systems. 

Recently it has been discovered that artificial neural networks can synchro- 
nize, too p, 0]. These mathematical models have been first developed to study 
and simulate the behavior of biological neurons. But it was soon discovered that 
complex problems in computer science can be solved by using neural networks. 
This is especially true if there is little information about the problem available. 
In this case the development of a conventional algorithm is very difficult or even 
impossible. In contrast, neural networks have the ability to learn from exam- 
ples. That is why one does not have to know the exact rule in order to train 
a neural network. In fact, it is sufficient to give some examples of the desired 
classification and the network takes care of the generalization. Several methods 
and applications of neural networks can be found in [8|. 

A feed-forward neural network defines a mapping between its input vector x 
and one or more output values cxj. Of course, this mapping is not fixed, but can 
be changed by adjusting the weight vector w, which defines the influence of each 
input value on the output. For the update of the weights there are two basic 
algorithms possible: In batch learning all examples are presented at the same 
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Figure 1.1: Key exchange between two partners with a passive attacker hstening 
to the communication. 



time and then an optimal weight vector is calculated. Obviously, this only works 
for static rules. But in online learning only one example is used in each time 
step. Therefore it is possible to train a neural network using dynamical rules, 
which change over time. Thus the examples can be generated by another neural 
network, which adjusts its weights, too. 

This approach leads to interacting neural feed-forward networks, which syn- 
chronize by mutual learning j^. They receive common input vectors and are 
trained using the outputs of the other networks. After a short time full synchro- 
nization is reached and one observes either parallel or anti-parallel weight vectors, 
which stay synchronized, although they move in time. Similar to other systems 
there is no obvious difference between unidirectional and bidirectional interaction 
in the case of simple perceptrons [o^. 

But Tree Parity Machines, which are more complex neural networks with a 
special structure, show a new phenomenon. Synchronization by mutual learning 
is much faster than learning by adapting to examples generated by other networks 
Therefore one can distinguish active and passive participants in such a 
communication. This allows for new applications, which are not possible with 
the systems known before. Especially the idea to use neural synchronization for 



a cryptographic key-exchange protocol, which has been first proposed in [13|, has 
stimulated most research in this area [9H12, 114-124 • 



Such an algorithm can be used to solve a common cryptographic problem [25 



Two partners Alice and Bob want to exchange secret messages over a public 
channel. In order to protect the content against an opponent Eve, A encrypts 
her message using a fast symmetric encryption algorithm. But now B needs to 
know A's key for reading her message. This situation is depicted in figure II. 1[ 



In fact, there are three possible solutions for this key-exchange problem [26 
First A and B could use a second private channel to transmit the key, e. g. they 
could meet in person for this purpose. But usually this is very difficult or just 
impossible. Alternatively, the partners can use public-key cryptography. Here 
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an asymmetric encryption algorithm is employed so that the public keys of A's 
and B's key pair can be exchanged between the partners without the need to 
keep them secret. But asymmetric encryption is much slower than symmetric 
algorithms. That is why it is only used to transmit a symmetric session key. 
However, one can achieve the same result by using a key-exchange protocol. In 
this case messages are transmitted over the public channel and afterwards A and 
B generate a secret key based on the exchanged information. But E is unable to 
discover the key because listening to the communication is not sufficient. 



Such a protocol can be constructed using neural synchronization [13[. Two 
Tree Parity Machines, one for A and one for B respectively, start with random 
initial weights, which are kept secret. In each step a new random input vec- 
tor is generated publicly. Then the partners calculate the output of their neural 
networks and send it to each other. Afterwards the weight vectors are updated ac- 
cording to a suitable learning rule. Because both inputs and weights are discrete, 
this procedure leads to full synchronization, = w^, after a finite number of 
steps. Then A and B can use the weight vectors as a common secret key. 

In this case the difference between unidirectional learning and bidirectional 
synchronization is essential for the security of the cryptographic application. As 
E cannot influence A and B, she is usually not able to achieve synchronization by 
the time A and B flnish generating the key and stop the transmission of the output 
bits Consequently, attacks based on learning have only a small probability 



of success [iGj. But using other methods is difficult, too. After all the attacker 
does not know the internal representation of the multi-layer neural networks. In 
contrast, it is easy to reconstruct the learning process of a perceptron exactly 
due to the lack of hidden units. This corresponds with the observation that E is 
nearly always successful, if these simple networks are used j9|. 

Of course, one wants to compare the level of security achieved by the neural 
key-exchange protocol with other algorithms for key exchange. For that purpose 
some assumptions are necessary, which are standard for all cryptographic systems: 

• The attacker E knows all the messages exchanged between A and B. Thus 
each participant has the same amount of information about all the oth- 
ers. Furthermore the security of the neural key-exchange protocol does not 
depend on some special properties of the transmission channel. 

• E is unable to change the messages, so that only passive attacks are con- 
sidered. In order to achieve security against active methods, e. g. man-in- 
the-middle attacks, one has to implement additional provisions for authen- 
tication. 

• The algorithm is public, because keeping it secret does not improve the se- 
curity at all, but prevents cryptographic analysis. Although vulnerabilities 
may not be revealed, if one uses security by obscurity, an attacker can find 
them nevertheless. 
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In chapter [2] the basic algorithm for neural synchronization is explained. Def- 
initions of the order parameters used to analyze this effect can be found there, 
too. Additionally, it contains descriptions of all known methods for E's attacks 
on the neural key-exchange protocol. 

Then the dynamics of neural synchronization is discussed in chapter [3l It is 
shown that it is, in fact, a complex process driven by stochastic attractive and 
repulsive forces, whose properties depend on the chosen parameters. Looking 
especially at the average change of the overlap between corresponding hidden 
units in A's, B's and E's Tree Parity Machine reveals the differences between 
bidirectional and unidirectional interaction clearly. 

Chapter H] focuses on the security of the neural key-exchange protocol, which 
is essential for this application of neural synchronization. Of course, simulations 
of cryptographic useful systems do not show successful attacks and the other way 
round. That is why finding scaling laws in regard to effort and security is very 
important. As these relations can be used to extrapolate reliably, they play a 
major role here. 

Finally, chapter [5] presents a modification of the neural key-exchange protocol: 
Queries generated by A and B replace the random sequence of input vectors. Thus 
the partners have more influence on the process of synchronization, because they 
are able to control the frequency of repulsive steps as a function of the overlap. In 
doing so, A and B can improve the security of the neural key-exchange protocol 
without increasing the synchronization time. 



Chapter 2 

Neural synchronization 



Synchronization of neural networks is a special case of an online learn- 

ing situation. Two neural networks start with randomly chosen weight vectors. 
In each time step they receive a common input vector, calculate their outputs, 
and communicate them to each other. If they agree on the mapping between the 
current input and the output, their weights are updated according to a suitable 
learning rule. 

In the case of discrete weight values this process leads to full synchronization 
in a finite number of steps [9-12, 27|. Afterwards corresponding weights in both 



networks have the same value, even if they are updated by further applications 
of the learning rule. Thus full synchronization is an absorbing state. 

Additionally, a third neural network can be trained using the examples, input 
vectors and output values, generated by the process of synchronization. As this 
neural network cannot influence the others, it corresponds to a student network 
which tries to learn a time dependent mapping between inputs and outputs. 

In the case of perceptrons, which are simple neural networks, one cannot find 
any significant difference between these two situations: the average number of 
steps needed for synchronization and learning is the same [g], 0]. But in the case 
of the more complex Tree Parity Machines an interesting phenomenon can be 
observed: two neural networks learning from each other synchronize faster than 



a third network only listening to the communication |9l4l2 



This difference between bidirectional and unidirectional interaction can be 



used to solve the cryptographic key-exchange problem [13j. For that purpose 
the partners A and B synchronize their Tree Parity Machines. In doing so they 
generate their common session key faster than an attacker is able to discover it 
by training another neural network. Consequently, the difference between syn- 
chronization and learning is essential for the security of the neural key-exchange 
protocol. 

In this chapter the basic framework for neural synchronization is presented. 
This includes the structure of the networks, the learning rules, and the quantities 
used to describe the process of synchronization. 
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2. Neural synchronization 



2.1 Tree Parity Machines 

Tree Parity Machines, which are used by partners and attackers in neural cryp- 
tography, are muhi-layer feed-forward networks. Their general structure is shown 
in figure 12.11 




X 



Figure 2.1: A Tree Parity Machine with K = 3 and = 4. 

Such a neural network consists of K hidden units, which are perceptrons with 
independent receptive fields. Each one has N input neurons and one output 
neuron. All input values are binary. 



Xij e {-l,+l}. 



(2.1) 



and the weights, which define the mapping from input to output, are discrete 
numbers between —L and +L, 



Wij e {-L, -L + l,...,+L}. 



(2.2) 



Here the index i = 1,...,K denotes the i-th hidden unit of the Tree Parity 
Machine and j = 1, . . . , A^ the elements of the vector. 

As in other neural networks the weighted sum over the current input values 
is used to determine the output of the hidden units. Therefore the full state of 
each hidden neuron is given by its local field 



A^ 



W,; 



1 ^ 



(2.3) 



The output (jj of the i-th hidden unit is then defined as the sign of hi, 

sgn{hi) , (2.4) 



but the special case /ij = is mapped to (Tj = — 1 in order to ensure a binary 
output value. Thus a hidden unit is only active, = +1, if the weighted sum 
over its inputs is positive, otherwise it is inactive, cxj = —1. 



2.2 Learning rules 
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Then the total output r of a Tree Parity Machine is given by the product 
(parity) of the hidden units, 

K 

r = n^i, (2.5) 

i=l 

SO that r only indicates, if the number of inactive hidden units, with ai = — 1, is 
even (r = +1) or odd (r = —1). Consequently, there are 2^~^ different internal 
representations (di, cr2, . . . , ck), which lead to the same output value r. 

If there is only one hidden unit, r is equal to o"i. Consequently, a Tree Parity 
Machine with K = 1 shows the same behavior as a perceptron, which can be 
regarded as a special case of the more complex neural network. 



2.2 Learning rules 

At the beginning of the synchronization process A's and B's Tree Parity Machines 
start with randomly chosen and therefore uncorrelated weight vectors "wf^^. In 
each time step K public input vectors x, are generated randomly and the corre- 
sponding output bits r^/^ are calculated. 

Afterwards A and B communicate their output bits to each other. If they 
disagree, r"^ 7^ r^, the weights are not changed. Otherwise one of the following 
learning rules suitable for synchronization is applied: 



In the case of the Hebbian learning rule [16| both neural networks learn 
from each other: 



w+ = g{wij + a;i,,re((7,r)0(r^r^)) . (2.6) 
It is also possible that both networks are trained with the opposite of their 



own output. This is achieved by using the anti-Hebbian learning rule [11 



= 9{wi,j - x,,,re(a,r)e(r^r^)) . (2.7) 

But the set value of the output is not important for synchronization as long 
as it is the same for all participating neural networks. That is why one can 



use the random-walk learning rule too: 



w+. = g{wij + Xije{aiT)Q{T\'')) . (2.8) 

In any way only weights are changed by these learning rules, which are in hidden 
units with (Tj = r. By doing so it is impossible to tell which weights are up- 
dated without knowing the internal representation {ai,a2, ■ ■ ■ ,<Jk)- This feature 
is especially needed for the cryptographic application of neural synchronization. 
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2. Neural synchronization 



Of course, the learning rules have to assure that the weights stay in the allowed 
range between —L and +L. If any weight moves outside this region, it is reset 
to the nearest boundary value ±L. This is achieved by the function g{w) in each 
learning rule: 

,H = |^SnHL for 1^1 >L _ 
w otherwise 

Afterwards the current synchronization step is finished. This process can be 
repeated until corresponding weights in A's and B's Tree Parity Machine have 
equal values, = . Further applications of the learning rule are unable 
to destroy this synchronization, because the movements of the weights depend 
only on the inputs and weights, which are then identical in A's and B's neural 
networks. 



2.3 Order parameters 

In order to describe the correlations between two Tree Parity Machines caused 
by the synchronization process, one can look at the probability distribution of 
the weight values in each hidden unit. It is given by (2L + 1) variables 



B 



which are defined as the probability to find a weight with w. 
Parity Machine and wj^j = b in B's neural network. 



(2.10) 
a in A's Tree 



While these probabilities are only approximately given as relative frequencies 
in simulations with finite N, their development can be calculated using exact 
equations of motion in the limit N —>■ oo 17H19|. This method is explained in 
detail in appendix [Bi 

In both cases, simulation and iterative calculation, the standard order param- 
eters 281], which are also used for the analysis of online learning, can be calculated 
as functions of 



Qt 
Qf 



R 



AB 



1 

iV 

1 

iV 

1 

iV 



wfwf 



a=—L b=—L 
L L 



b ) 



a=—L b=—L 
L L 



E E -opi. 



b ■ 



(2.11) 
(2.12) 
(2.13) 



-L b=-L 

Then the level of synchronization is given by the normalized overlap 28|] between 
two corresponding hidden units: 



P. 



AB 



B 



(2.14) 



2.4 Neural cryptography 
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Uncorrelated hidden units, e. g. at the beginning of the synchronization process, 
have Pi = 0, while the maximum value = 1 is reached for fully synchronized 
weights. Consequently, pi is the most important quantity for analyzing the pro- 
cess of synchronization. 

But it is also interesting to estimate the mutual information gained by the 
partners during the process of synchronization. For this purpose one has to 



calculate the entropy (29 



Sf'' = -NY. Y.p:,,,lnpl, (2.15) 

a=—L b=—L 

of the joint weight distribution of A's and B's neural networks. Similarly the 
entropy of the weights in a single hidden unit is given by 

Sf = E ( E Pl] 1^ ( E Pl] ' (2-16) 

a=-L \b=-L / \b=-L / 

Sf = E ( E Pl] 1^ ( E pi] ■ (2-17) 

b=-L \a=-L / \a=~L / 

Of course, these equations assume that there are no correlations between differ- 
ent weights in one hidden unit. This is correct in the limit oo, but not 
necessarily for small systems. 

Using (12A5D . (12A611 . and (IZTTI) the mutual information [29^ of A's and B's 
Tree Parity Machines can be calculated as 

K 



I^^ = J2iSf + Sf-Sr). (2.18) 



i=l 

At the beginning of the synchronization process, the partners only know the 
weight configuration of their own neural network, so that I^^ = 0. But for fully 
synchronized weight vectors this quantity is equal to the entropy of a single Tree 
Parity Machine, which is given by 

So = KN\n{2L + l) (2.19) 

in the case of uniformly distributed weights. 



2.4 Neural cryptography 

The neural key-exchange protocol [l^ is an application of neural synchronization. 
Both partners A and B use a Tree Parity Machine with the same structure. The 
parameters K, L and A^ are public. Each neural network starts with randomly 
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2. Neural synchronization 



chosen weight vectors. These initial conditions are kept secret. During the syn- 
chronization process, which is described in section 12.21 only the input vectors Xj 
and the total outputs r"^, are transmitted over the public channel. Therefore 
each participant just knows the internal representation (cti, (72, • • • , c"x) of his own 
Tree Parity Machine. Keeping this information secret is essential for the security 
of the key-exchange protocol. After achieving full synchronization A and B use 
the weight vectors as common secret key. 

The main problem of the attacker E is that the internal representations 
c"2, • • • , o'k) of A's and B's Tree Parity Machines are not known to her. As 
the movement of the weights depends on cTj, it is important for a successful attack 
to guess the state of the hidden units correctly. Of course, most known attacks 
use this approach. But there are other possibilities and it is indeed possible that 
a clever attack method will be found, which breaks the security of neural cryp- 
tography completely. However, this risk exists for all cryptographic algorithms 
except the one-time pad. 



2.4.1 Simple attack 



For the simple attack [13| E just trains a third Tree Parity Machine with the 
examples consisting of input vectors Xj and output bits t^. These can be obtained 
easily by intercepting the messages transmitted by the partners over the public 
channel. E's neural network has the same structure as A's and B's and starts 
with random initial weights, too. 

In each time step the attacker calculates the output of her neural network. 
Afterwards E uses the same learning rule as the partners, but is replaced by 
r^. Thus the update of the weights is given by one of the following equations: 

• Hebbian learning rule: 

wf/ = g{wf^^ + x,,,r^0(af r^)e(r^r^)) . (2.20) 



• Anti-Hebbian learning rule: 



• Random walk learning rule: 

= aK, + a:,,e(af r^)e(r^r«)) . (2.22) 



So E uses the internal representation (cxf , af', . . . , cx^) of her own network in 
order to estimate A's, even if the total output is different. As ^ indicates 
that there is at least one hidden unit with ^ af, this is certainly not the best 
algorithm available for an attacker. 



2.4 Neural cryptography 
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2.4.2 Geometric attack 

The geometric attack 2^ performs better than the simple attack, because E takes 
and the local fields of her hidden units into account. In fact, it is the most 
successful method for an attacker using only a single Tree Parity Machine. 

Similar to the simple attack E tries to imitate B without being able to interact 
with A. As long as = t^, this can be done by just applying the same learning 
rule as the partners A and B. But in the case of 7^ E cannot stop A's update 
of the weights. Instead the attacker tries to correct the internal representation of 
her own Tree Parity Machine using the local fields hf, /if, . . . , as additional 
information. These quantities can be used to determine the level of confidence 



associated with the output of each hidden unit [3(|. As a low absolute value \hf\ 
indicates a high probability of af 7^ af, the attacker changes the output erf of 
the hidden unit with minimal \hf\ and the total output before applying the 
learning rule. 

Of course, the geometric attack does not always succeed in estimating the 
internal representation of A's Tree Parity Machine correctly. Sometimes there 
are several hidden units with 7^ af. In this case the change of one output bit 
is not enough. It is also possible that af = af for the hidden unit with minimal 
I /if I, so that the geometric correction makes the result worse than before. 



2.4.3 Majority attack 



With the majority attack [21j E can improve her ability to predict the internal 
representation of A's neural network. For that purpose the attacker uses an 
ensemble of M Tree Parity Machines instead of a single neural network. At 
the beginning of the synchronization process the weight vectors of all attacking 
networks are chosen randomly, so that their average overlap is zero. 

Similar to other attacks, E does not change the weights in time steps with 

_i_ ^ because the partners skip these input vectors, too. But for = an 
update is necessary and the attacker calculates the output bits r^'™ of her Tree 
Parity Machines. If the output bit r^'"^ of the m-th attacking network disagrees 
with T^, E searches the hidden unit i with minimal absolute local field |/if'™|. 
Then the output bits erf''" and t^'"^ are inverted similarly to the geometric attack. 
Afterwards the attacker counts the internal representations (erf'™', . . . , cr^'™') of 
her Tree Parity Machines and selects the most common one. This majority vote 
is then adopted by all attacking networks for the application of the learning rule. 

But these identical updates create and amplify correlations between E's Tree 
Parity Machines, which reduce the efficiency of the majority attack. Especially if 
the attacking neural networks become fully synchronized, this method is reduced 
to a geometric attack. 

In order to keep the Tree Parity Machines as uncorrelated as possible, majority 



attack and geometric attack are used alternately [2l|. In even time steps the 
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majority vote is used for learning, but otherwise E only applies the geometric 
correction. Therefore not all updates of the weight vectors are identical, so that 
the overlap between them is reduced. Additionally, E replaces the majority attack 
by the geometric attack in the first 100 time steps of the synchronization process. 



2.4.4 Genetic attack 



The genetic attack [22] offers an alternative approach for the opponent, which is 
not based on optimizing the prediction of the internal representation, but on an 
evolutionary algorithm. E starts with only one randomly initialized Tree Parity 
Machine, but she can use up to M neural networks. 

Whenever the partners update the weights because of = in a time step, 
the following genetic algorithm is applied: 

• As long as E has at most Tree Parity Machines, she determines all 
2^-1 internal representations (erf, . . . , a^) which reproduce the output r"^. 
Afterwards these are used to update the weights in the attacking networks 
according to the learning rule. By doing so E creates 2^~^ variants of each 
Tree Parity Machine in this mutation step. 

• But if E already has more than M/2^~^ neural networks, only the fittest 
Tree Parity Machines should be kept. This is achieved by discarding all 
networks which predicted less than U outputs in the last V learning 
steps, with successfully. A limit of U = 10 and a history of 
V = 20 are used as default values for the selection step. Additionally, E 
keeps at least 20 of her Tree Parity Machines. 

The efficiency of the genetic attack mostly depends on the algorithm which 
selects the fittest neural networks. In the ideal case the Tree Parity Machine, 
which has the same sequence of internal representations as A is never discarded. 
Then the problem of the opponent E would be reduced to the synchronization 
of K perceptrons and the genetic attack would succeed certainly. However, this 
algorithm as well as other methods available for the opponent E are not perfect, 
which is clearly shown in chapter |H 



Chapter 3 



Dynamics of the neural 
synchronization process 



Neural synchronization is a stochastic process consisting of discrete steps, in 
which the weights of participating neural networks are adjusted according to the 
algorithms presented in chapter [21 In order to understand why unidirectional 
learning and bidirectional synchronization show different effects, it is reasonable 
to take a closer look at the dynamics of these processes. 

Although both are completely determined by the initial weight vectors w, 
of the Tree Parity Machines and the sequence of random input vectors Xj, one 
cannot calculate the result of each initial condition, as there are too many except 
for very small systems. Instead of that the effect of the synchronization steps 
on the overlap pi of two corresponding hidden units is analyzed. This order 



parameter is defined as the cosine of the angle between the weight vectors [28 



Attractive steps increase the overlap, while repulsive steps decrease it [191 . 

As the probabilities for both types of steps as well as the average step sizes 
(Apa), i^Pv) depend on the current overlap, neural synchronization can be re- 
garded as a random walk in p-space. Hence the average change of the overlap 
(Ap(p)) shows the most important properties of the dynamics. Especially the 
difference between bidirectional and unidirectional interaction is clearly visible. 

As long as two Tree Parity Machines influence each other, repulsive steps 
have only little effect on the process of synchronization. Therefore it is possible 
to neglect this type of step in order to determine the scaling of the synchronization 
time tsync- For that purpose a random walk model consisting of two corresponding 



weights is analyzed [27 



But in the case of unidirectional interaction the higher frequency of repulsive 
steps leads to a completely different dynamics of the system, so that synchro- 
nization is only possible by fluctuations. Hence the scaling of (tsync) changes to 
an exponential increase with L. This effect is important for the cryptographic 
application of neural synchronization, as it is essential for the security of the 
neural key-exchange protocol. 
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3.1 Effect of the learning rules 

The learning rules used for synchronizing Tree Parity Machines, which have been 
presented in section 12.21 share a common structure. That is why they can be 
described by a single equation 

= diwij + f{(Ti, r^, T^)xij) (3.1) 

with a function f{a, t^,t^), which can take the values —1, 0, or +1. In the case 
of bidirectional interaction it is given by 

{a Hebbian learning rule 
-a anti-Hebbian learning rule . (3.2) 
1 random walk learning rule 

The common part Q(aT'^)Q{r^r^) of /(a, r"^, r^) controls, when the weight vec- 
tor of a hidden unit is adjusted. Because it is responsible for the occurrence of 
attractive and repulsive steps as shown in section 13.1.21 all three learning rules 
have similar effects on the overlap. But the second part, which influences the 
direction of the movements, changes the distribution of the weights in the case 
of Hebbian and anti-Hebbian learning. This results in deviations, especially for 
small system sizes, which is the topic of section 13.1.11 

Equation (13. ip together with (13. 2p also describes the update of the weights 
for unidirectional interaction, after the output and the internal representation 
(erf, cr|', . . . , erf) have been adjusted by the learning algorithm. That is why one 
observes the same types of steps in this case. 



3.1.1 Distribution of the weights 



According to (13. 2p the only difference between the learning rules is, whether and 
how the output of a hidden unit affects Awij = w^j Although this does 

not change the qualitative effect of an update step, it influences the distribution 
of the weights [22 . 

In the case of the Hebbian rule (12.60 . A's and B's Tree Parity Machines learn 
their own output. Therefore the direction in which the weight Wi^ moves is 
determined by the product a-iXij. As the output cTj is a function of all input 
values, and CTj are correlated random variables. Thus the probabilities to 
observe aiXij = +1 or aiXij = —1 are not equal, but depend on the value of the 
corresponding weight Wij: 



PiyOiXi^j 



1 

2 



1 + erf 



w 



'NQi 



(3.3) 
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According to this equation, aiXij = sgn(wjj) occurs more often than the opposite, 
o'iXij = — sgn(wjj). Consequently, the Hebbian learning rule fl2.6p pushes the 
weights towards the boundaries at —L and +L. 

In order to quantify this effect the stationary probability distribution of the 
weights for t — > cxD is calculated using (13.31) for the transition probabilities. This 
leads to 22 



P(w. 



w 



\w 1 + erf 
m=i 1 — erf I 



m—l 



(3.4) 



Here the normalization constant po is given by 

/ 

Po = 



\w 1 + erf 

E n — 

w=—L m=l 1 — g2:f 



m—l 



^JVQ,-(m-l)2 



^/NQ~- 



-1 



(3.5) 



In the limit ^ oo the argument of the error functions vanishes, so that the 
weights stay uniformly distributed. In this case the initial length 



(3.6) 



of the weight vectors is not changed by the process of synchronization. 

But, for finite the probability distribution (13.41) itself depends on the order 
parameter Qi. Therefore its expectation value is given by the solution of the 
following equation: 



<5i = 5Z ^^^(^ij 



W 



(3.7) 



Expanding it in terms of A^ results in 22[ 

L{L + 1) 8L^ + - 10L2 - 18L + 9 1 



Q^ 



+ 



15v/37rL(L + 1) 



N 



(3.8) 



as a first-order approximation of Qi for large system sizes. The asymptotic be- 
havior of this order parameter in the case of 1 ^ L ^ is given by 



Qi 



L{L + l] 



1 + 



L 



5^3^ ViV 



(3.9) 



Thus each application of the Hebbian learning rule increases the length of the 
weight vectors Wj until a steady state is reached. The size of this effect depends 
on L/\fN and disappears in the limit L/\/N 0. 
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o Hebbian learning 
□ anti-Hebbian learning 
o random walk 




Figure 3.1: Length of the weight vectors in the steady state for K = 3 and 
N = 1000. Symbols denote results averaged over 1000 simulations and lines 
show the first-order approximation given in (I3.8p and fl3.10l) . 



In the case of the anti-Hebbian rule (12. 7p A's and B's Tree Parity Machines 
learn the opposite of their own outputs. Therefore the weights are pulled away 
from the boundaries instead of being pushed towards ±L. Here the first-order 
approximation of Qi is given by 22 



Qi 



L{L + 1) 8L^ + IQL^ - 10L2 - 18L + 9 1 
3 15J3nL{L + l) 



(3,10) 



which asymptotically converges to 

n ^(-^ + ^) 

Qi ~ 7, 



L 



5V3^Vn 



(3.11) 



in the case 1 <^ L <^ -y/iV. Hence applying the anti-Hebbian learning rule 
decreases the length of the weight vectors Wj until a steady state is reached. As 
before, L/\fN determines the size of this effect. 

In contrast, the random walk rule (12. 8p always uses a fixed set output. Here 
the weights stay uniformly distributed, as the random input values Xij alone 
determine the direction of the movements. Consequently, the length of the weight 
vectors is always given by (13. 6p . 

Figure 13.11 shows that the theoretical predictions are in good quantitative 
agreement with simulation results as long as is small compared to the system 
size N . The deviations for large L are caused by higher-order terms which are 
ignored in (ES]) and (1310|) . 
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Figure 3.2: Time evolution of the weight distribution in the case of synchroniza- 
tion with K = 3 and L = 5, obtained in 100 simulations consisting of 100 pairs 
of Tree Parity Machines. 



Of course, the change of the weight distribution is also directly visible in the 
relative entropy S^/Sq as shown in figure 13.21 While the weights are always 
uniformly distributed at the beginning of the synchronization process, so that 

= So, only the random walk learning rule preserves this property. Otherwise 
S"^ decreases until the length of the weight vectors reaches its stationary state 
after a few steps. Therefore the transient has only little infiuence on the process 
of synchronization and one can assume a constant value of both Qi and S^. 

In the limit N ^ oo, however, a system using Hebbian or anti-Hebbian learn- 
ing exhibits the same dynamics as observed in the case of the random walk rule for 
all system sizes. Consequently, there are two possibilities to determine the prop- 
erties of neural synchronization without interfering finite-size effects. First, one 
can run simulations for the random walk learning rule and moderate system sizes. 
Second, the evolution of the probabilities which describe the distribution of 
the weights in two corresponding hidden units, can be calculated iteratively for 
N ^ oo. Both methods have been used in order to obtain the results presented 
in this thesis. 

3.1.2 Attractive and repulsive steps 

As the internal representation (ai, (72, ... , <Jk) is not visible to other neural net- 
works, two types of synchronization steps are possible: 

• For = = af = the weights of both corresponding hidden units are 
moved in the same direction. As long as both weights, w^j and w^j, stay 
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in the range between —L and +L, their distance dij = \wfj — wj^j\ remains 
unchanged. But if one of them hits the boundary at ±L, it is reflected, so 
that dij decreases by one, until dij = is reached. Therefore a sequence of 
these attractive steps leads to full synchronization eventually. 

• If = r-^, but erf ^ af , only the weight vector of one hidden unit is 
changed. Two corresponding weights which have been already synchro- 
nized before, wfj = wj^j, are separated by this movement, unless this is 
prevented by the boundary conditions. Consequently, this repulsive step 
reduces the correlations between corresponding weights and impedes the 
process of synchronization. 

In all other situations the weights of the i-th hidden unit in A's and B's Tree 
Parity Machines are not modified at all. 

In the limit ^ oo the effects of attractive and repulsive steps can be 
described by the following equations of motion for the probability distribution 



of the weights [17H19|. In attractive steps the weights perform an anisotropic 
diffusion 

Pl^b = I {pUw + Pi-i,b^i) (3.12) 

and move on the diagonals of a {2L + 1) x [2L + 1) square lattice. Repulsive 
steps, instead, are equal to normal diffusion steps 

PlTfe = \ {Pi+i,b + Pi-i,b + Pib+i + Pa,b-i) (3-13) 

on the same lattice. However, one has to take the reflecting boundary conditions 
into account. Therefore fl3.12l) and fl3.13p are only defined for —L < a,b < +L. 
Similar equations for the weights on the boundary can be found in appendix [Bl 
Starting from the development of the variables Pa^b one can calculate the 
change of the overlap in both types of steps. In general, the results 

APa = -^^^^ (^1 - E + 2)PL,, + PL,?j (3.14) 
for attractive steps and 

3 ^ J 

Ap. = E 2^PL.-P-L,) (3.15) 

j=-L 

for repulsive steps are not only functions of the current overlap, but also depend 
explicitly on the probability distribution of the weights. That is why Apa(p) 
and Apr(p) are random variables, whose properties have to be determined in 
simulations of finite systems or iterative calculations for N ^ oo. 
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Figure 3.3: Effect of attractive (upper curve) and repulsive steps (lower curve) 
for K = 3 and L = 10. Symbols represent averages over 1000 simulations using 
N = 100 and the random walk learning rule. The line shows the corresponding 
result of 1000 iterative calculations for synchronization in the limit oo. 



Figure 13.31 shows that each attractive step increases the overlap on average. 



At the beginning of the synchronization it has its maximum effect [31 



as the weights are uncorrelated, 

P..(p = 0)=p^. (3.17) 

But as soon as full synchronization is reached, an attractive step cannot increase 
the overlap further, so that Apa(p = 1) = 0. Thus p = 1 is a fixed point for a 
sequence of these steps. 

In contrast, a repulsive step reduces a previously gained positive overlap on 



average. Its maximum effect [31 



is reached in the case of fully synchronized weights. 



, (2L + l)-i for a = 6 
PaAP = l) = { fora^fe • (2-^^) 
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Figure 3.4: Effect of attractive (upper curves) and repulsive steps (lower curves) 
for different learning rules with K = 3 and L = 10. Symbols denote averages over 
1000 simulations, while the lines show the results of 1000 iterative calculations. 

But if the weights are uncorrelated, p = 0, a repulsive step has no effect. Hence 
p = is a fixed point for a sequence of these steps. 

It is clearly visible in figure 13.31 that the results obtained by simulations with 
the random walk learning rule and iterative calculations for — > oo are in 
good quantitative agreement. This shows that both (Apa(p)) and (Api.(p)) are 
independent of the system size N. Additionally, the choice of the synchronization 
algorithm does not matter, which indicates a similar distribution of the weights for 
both unidirectional and bidirectional interaction. Consequently, the differences 
observed between learning and synchronization are caused by the probabilities of 
attractive and repulsive steps, but not their effects. 

However, the distribution of the weights is obviously altered by Hebbian and 
anti-Hebbian learning in finite systems, so that average change of the overlap in 
attractive and repulsive steps is different from the result for the random walk 
learning rule. This is clearly visible in figure 13. 4[ In the case of the Hebbian 
learning rule the effect of both types of steps is enhanced, but for anti-Hebbian 
learning it is reduced. It is even possible that an repulsive step has an attractive 
effect on average, if the overlap p is small. This explains why one observes finite- 



Using the equations (13.161) and (13.181) one can obtain the rescaled quantities 
(Apa(p))/Apa(0) and (Api.(p))/Apr(l). They become asymptotically indepen- 
dent of the synaptic depth L in the limit L —>■ oo a.s shown in figure 13.51 and 
figure [321 Therefore these two scaling functions together with Apa(O) and Apr(l) 
are sufficient to describe the effect of attractive and repulsive steps [3l|. 
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Figure 3.6: Scaling behavior of the average step size (Apr) for repulsive steps. 
These results were obtained in 1000 iterative calculations for K = 3 and oo. 
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3.2 Transition probabilities 

While (Apa) and (Apr) are identical for synchronization and learning, the proba- 
bilities of attractive and repulsive steps depend on the type of interaction between 
the neural networks. Therefore these quantities are important for the differences 
between partners and attackers in neural cryptography. 

A repulsive step can only occur if two corresponding hidden units have differ- 
ent (Tj. The probability for this event is given by the well-known generalization 



error 
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1 

= — arccospi (3.20) 

TT 

of the perceptron. However, disagreeing hidden units alone are not sufficient for a 
repulsive step, as the weights of all neural networks are only changed if 
Therefore the probability of a repulsive step is given by 

P, = P{af^af"'\r^ = r^), (3.21) 

after possible corrections of the output bits have been applied in the case of 
advanced learning algorithms. Similarly, one finds 

= P(r^ = af = af"'\T^ = r^) (3.22) 

for the probability of attractive steps. 



3.2.1 Simple attack 

In the case of the simple attack, the outputs af of E's Tree Parity Machine are 
not corrected before the application of the learning rule and the update of the 
weights occurs independent of r^, as mutual interaction is not possible. Therefore 



a repulsive step in the z-th hidden unit occurs with probability [19 



= Q . (3.23) 

But if two corresponding hidden units agree on their output o"j, this does not 
always lead to an attractive step, because (jj = r is another necessary condition 
for an update of the weights. Thus the probability of an attractive step is given 



by |3i 



P^ = \{l-e^) (3.24) 

for K > 1. In the special case K = 1, however, (Xj = r is always true, so that this 
type of steps occurs with double frequency: P^ = 1 — e,. 
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3.2.2 Synchronization 

In contrast, mutual interaction is an integral part of bidirectional synchronization. 
When an odd number of hidden units disagrees on the output, ^ signals 
that adjusting the weights would have a repulsive effect on at least one of the 
weight vectors. Therefore A and B skip this synchronization step. 

But when an even number of hidden units disagrees on the output, the part- 
ners cannot detect repulsive steps by comparing and . Additionally, iden- 
tical internal representations in both networks are more likely than two or more 
different output bits ^ af, if there are already some correlations between the 
Tree Parity Machines. Consequently, the weights are updated if 

In the case of identical overlap in all K hidden units, = e, the probability 
of this event is given by 

Pn = P{r^ = ^'') = E ( 9 ■ (1 - ■ (3-25) 



i=0 



Of course, only attractive steps are possible if two perceptrons learn from each 
other [K = 1). But for synchronization of Tree Parity Machines with K > 1, the 
probabilities of attractive and repulsive are given by: 



2Pu ^ V 2z 

1=0 

1=1 ^ 



^ E r,:>i-)'"-^^ (3-26) 

l-ef-^'e^\ (3.27) 



In the case of three hidden units [K = 3) , which is the usual choice for the neural 
key-exchange protocol, this leads to 



- 2 (l-6)3 + 3(l-e)e2' ^'^'^^^ 

R 2(1 -ele' 

Pr"" = n i , on ■ (3-29) 



(1 -e)3 + 3(l~ e)e2 



Figure 13771 shows that repulsive steps occur more frequently in E's Tree Parity 
Machine than in A's or B's for equal overlap < p < 1. That is why the partners 
A and B have a clear advantage over a simple attacker in neural cryptography. 
But this difference becomes smaller and smaller with increasing K. Consequently, 
a large number of hidden units is detrimental for the security of the neural key- 
exchange protocol against the simple attack. 
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Figure 3.7: Probability Pf{p) of repulsive steps for synchronization with mutual 
interaction under the condition The dotted line shows P^{p) for a 

simple attack. 




Figure 3.8: Prediction error ef as a function of the local field hf for different 
values of the overlap pf^ and Qi = 1. 
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3.2.3 Geometric attack 



However, E can do better than simple learning by taking the local field into 
account. Then the probability of af ^ af is given by the prediction error [30 




(3.30) 



of the perceptron, which depends not only on the overlap pi, but also on the 
absolute value \hf\ of the local field. This quantity is a strictly decreasing function 
of I /if I as shown in figure [331 Therefore the geometric attack is often able to 
find the hidden unit with af ^ af by searching for the minimum of | /if | . If only 
the i-th. hidden unit disagrees and all other have = a^, the probability for a 
successful correction of the internal representation by using the geometric attack 
is given by 22 



P. 




^2^ 1 -e^ 



e ^'^^ d/ii 



e 2<3« dhi 



(3.31) 



In the case of identical order parameters Q 



Qf and R 



Rf^ this equation can 



be easily extended to k out of K hidden units with different outputs ^ . 
Then the probability for successful correction of of ^ is given by 

K / POO 1 r./7_\ o \ K-k 



X 



l-eP(/i) 

— e dh 



hi 



e 2Q dh 



hi 



k-1 



1-e 
eP(/i^ 



' 2Q d/i,- 



(3.32) 



Using a similar equation the probability for an erroneous correction of af 
can be calculated, too: 



K 



— — e 2Q dh 



K-k-l 



X 



eP(/i) 

e 2Q dh 



^ '■ e 2« dhi . 



1-e 



(3.33) 



Taking all possible internal representations of A's and E's neural networks 
into account, the probabihty of repulsive steps consists of three parts in the case 
of the geometric attack. 

• If the number of hidden units with af ^ af is even, no geometric correction 
happens at all. This is similar to bidirectional synchronization, so that one 
finds 

K/2 



^'5 = E 



i=l 



K 

2i 



K-2i 



(3.34) 
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It is possible that the hidden unit with the minimum \hf\ has the same 
output as its counterpart in A's Tree Parity Machine. Then the geomet- 
ric correction increases the deviation of the internal representations. The 
second part of Pf takes this event into account: 



r,2 



K/2 

E 

i=l 



K-1 
2i - 1 



2i-l 



(3.35) 



Similarly the geometric attack does not fix a deviation in the i-th hidden 
unit, if the output of another one is flipped instead. Indeed, this causes a 
repulsive step with probability 



pE 

r,3 



{K~l)/2 



K-1 
2i 



(3.36) 



Thus the probabilities of attractive and repulsive steps in the i-th hidden unit 
for K > 1 and identical order parameters are given by 



(3.37) 
(3.38) 



In the case K = 1, however, only attractive steps occur, because the algorithm 
of the geometric attack is then able to correct all deviations. And especially for 
K = 3 one can calculate these probabilities using (13.311) instead of the general 



equations, which yields [22 



= 2(l-Pg)(l-e)2e + 2(l-e)e2 4 



(3.39) 
(3.40) 



As shown in figure 13^ grows, if the number of hidden units is increased. It 
is even possible that the geometric attack performs worse than the simple attack 
at the beginning of the synchronization process (e ~ 0.5). While this behavior 
is similar to that observed in figure 13.71 Pf" is still higher than Pf for identical 
K. Consequently, even this advanced algorithm for unidirectional learning has a 
disadvantage compared to bidirectional synchronization, which is clearly visible 
in figure 13.101 
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Figure 3.10: Probability of repulsive steps for Tree Parity Machines with K = 3 
hidden units and different types of interaction. 



36 



3. Dynamics of the neural synchronization process 



3.3 Dynamics of the weights 

In each attractive step corresponding weights of A's and B's Tree Parity Machines 
move in the same direction, which is chosen with equal probabihty in the case of 
the random walk learning rule. The same is true for Hebbian and anti-Hebbian 
learning in the limit iV — oo as shown in section I3.1.1[ Of course, repulsive 
steps disturb this synchronization process. But for small overlap they have little 
effect, while they occur only seldom in the case of large p. That is why one can 
neglect repulsive steps in some situations and consequently describe neural syn- 
chronization as an ensemble of random walks with reflecting boundaries, driven 
by pairwise identical random signals 



















r 


ri 



Figure 3.11: Random walks with reflecting boundaries. 



This leads to a simple model for a pair of weights, which is shown in fig- 
ure 13.111 [i^] . Two random walks corresponding to wf^j and wfj can move on a 
one-dimensional line with m = 2L + 1 sites. In each step a direction, either left 
or right, is chosen randomly. Then the random walkers move in this direction. 
If one of them hits the boundary, it is reflected, so that its own position remains 
unchanged. As this does not affect the other random walker, which moves to- 
wards the first one, the distance d between them shrinks by 1 at each reflection. 
Otherwise d remains constant. 

The most important quantity of this model is the synchronization time T of 
the two random walkers, which is defined as the number of steps needed to reach 
(i = starting with random initial positions. In order to calculate the mean value 
(T) and analyze the probability distribution P(T = t), this process is divided into 
independent parts, each of them with constant distance d. Their duration Sd^z is 
given by the time between two reflections. Of course, this quantity depends not 
only on the distance d, but also on the initial position z = L + mm{wf^j, wfj) + 1 
of the left random walker. 



3.3.1 Waiting time for a reflection 

If the first move is to the right, a reflection only occurs for z = m — d. Otherwise, 
the synchronization process continues as if the initial position had been z + 1. 
In this case the average number of steps with distance d is given by {Sd,z+i + 1)- 
Similarly, if the two random walkers move to the left in the first step, this quantity 
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is equal to (S'^^^-i + !)• Averaging over both possibilities leads to the following 
difference equation |27|: 



{Sd,z) — -{Sd,z-l) + -{Sd,z+l) + 1 • 



(3.41) 



Reflections are only possible, if the current position z is either 1 or m — d. In 
both situations d changes with probability ^ in the next step, which is taken into 
account by using the boundary conditions 



Sd,o — and Sd,m-d+i — . 



(3.42) 



As f l3.4ip is identical to the classical ruin problem 32], its solution is given by 

{Sd,z) = {m-d + l)z-z\ (3.43) 

In order to calculate the standard deviation of the synchronization time an 
additional difference equation. 



{siz) = l{iSd,z-i+ir)+l{iSd,z+i+ir), 



(3.44) 



is necessary, which can be obtained in a similar manner as equation (13.411) . Using 
both ([S33D and flCTl) leads to the relation 



(sL) - {Sd,zy 



{Sd,z-i) ~ {Sd,z-iy ^ {sj^z+i) ~ {Sd,z+i)'^ 



+ {m-d+l-2z)^ (3.45) 
for the variance of Sd,z- Applying a Z-transformation flnally yields the solution 

(m - d + 1 - z)^ + z^ - 2 



{Sd,z) - {Sd,z)' 



{SdJ . 



(3.46) 



While the flrst two moments of S^^z are suflicient to calculate the mean value 
and the standard deviation of T, the probability distribution P{Sd,z = t) must 
be known in order to further analyze P{T = t). For that purpose a result known 
from the solution of the classical ruin problem ^] is used: The probability that 
a fair game ends with the ruin of one player in time step t is given by 



u 



k=l ^ / L \ 



a 



+ sin ( kn 



kTT\ 

a J 



cos 



kiT 
a 



t-i 



(3.47) 



In the random walk model a — 1 = m — d denotes the number of possible positions 
for two random walkers with distance d. And u{t) is the probability distribution 
of the random variable Sd,z- As before, z = L + min{w^j , w^j) + 1 denotes the 
initial position of the left random walker. 
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3.3.2 Synchronization of two random walks 

With these results one can determine the properties of the synchronization time 
T^^z for two random walks starting at position z and distance d. After the first 
reflection at time Sd,z one of the random walkers is located at the boundary. As 
the model is symmetric, both possibilities z = loTz = 'm — d are equal. Hence 
the second reflection takes place after Sd,z + Sd-i,i steps and, consequently, the 
total synchronization time is given by 

d-l 

Td,z = Sd,z + J2S,,i- (3.48) 



Using (Km leads to [27 



(Trf,^) = (^m - d + l)z - z'^ + ^{d - l)(2m - d) (3.49) 

for the expectation value of this random variable. In a similar manner one can 
calculate the variance of T^^z, because the parts of the synchronization process 
are mutually independent. 

Finally, one has to average over all possible initial conditions in order to 
determine the mean value and the standard deviation of the synchronization 
time T for randomly chosen starting positions of the two random walkers |271] : 

- J^EEV..)^^.^, (3.50) 

d=l 2=1 
„ m—1 m—d 

m ^ — ^ ^ — ^ 

d=l z=l 



17m^ — 51m^ + 65m^ — 45m^ + 8m + 6 
90m 



(3.51) 



Thus the average number of attractive steps required to reach a synchronized 
state, which is shown in figure 13.121 increases nearly proportional to m^. In 
particular for large system sizes m the asymptotic behavior is given by 

(T) ~ im^ ~ ^L' . (3.52) 

As shown later in section 13.4.11 this result is consistent with the scaling behavior 
(4ync) oc found in the case of neural synchronization [l6| . 

In numerical simulations, both for random walks and neural networks, large 
fluctuations of the synchronization time are observed. The reason for this effect 



is that not only the mean value but also the standard deviation of T [27|] , 



7m^ — llm^ — 15m'^ + 55m'^ — 72m^ + 46m — 10 
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Figure 3.12: Synchronization time of two random walks as a function of the 
system size m = 2L + 1. Error bars denote the standard deviation observed in 
1000 simulations. The analytical solution fl3.50p is plotted as dashed curve. 



increases with the extension m of the random walks. A closer look at fl3.52p and 
(I3.53P reveals that ax is asymptotically proportional to (T): 




(3.54) 



Therefore the relative fluctuations aT/{T) are nearly independent of m and not 
negligible. Consequently, one cannot assume a typical synchronization time, but 
has to take the full distribution P{T = t) into account. 



3.3.3 Probability distribution 

As Tfi^z is the sum over Sij for each distance i from d to 1 according to fl3.48l) . its 
probability distribution PiT^^z = t) is a convolution of d functions u{t) defined in 
f l3.47p . The convolution of two different geometric sequences 6„ = and Cn = 
is itself a linear combination of these sequences: 



n-1 



— c c — 

Thus P(Trf 2 = t) can be written as a sum over geometric sequences, too: 



(3.55) 



a-l 



'a,k 



a=m—d+l k=l 



COS 



a 



(3.56) 
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Figure 3.13: Value of the coefficients Cm,i and Cm,m-i as a function of m. The 
approximation given in (13.621) is shown as dashed curve. 



In order to obtain P(T = t) for random initial conditions, one has to average 
over all possible starting positions of both random walkers. But even this result 



m— 1 m—d 



d=l z=l 

can be written as a sum over a lot of geometric sequences: 



(3.57) 



m a—1 



a=2 k=l 



fkn\ 
cos — 

V « / 



(3.58) 



For long times, however, only the terms with the largest absolute value of the 
coefficient cos(A;7r/a) are relevant, because the others decline exponentially faster. 
Hence one can neglect them in the limit t — > oo, so that the asymptotic behavior 
of the probability distribution is given by 



P(T = t) ~ [c, 



m,l 



m,m—l\ 



COS I — ) 
mJ . 



t-i 



(3.59) 



The two coefficients Cm,i and Cm.m-i in this equation can be calculated using 
(I3.55P . This leads to the following result [27| , which is shown in figure I3.13t 



sin^(7r/m) 



m—l 



y " ^ 



2^+i(m-rf)! 
cos(7r/m) 



m— 1 a~l 



sin^^kn/a) 



_ cos(7r/m) - cos{kTc/a) 1 - 5a,m-d+i cos(A;7r/a) 



,(3.60) 
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Figure 3.14: Probability distribution P(T = t) of the synchronization time for 
m = 7 (L = 3). The numerical result is plotted as full curve. The dashed line 
denotes the asymptotic function defined in fl3.63l) . 



-'m,m— 1 



sin^ (n/m) cos^ {rmr / 2) 



m—l 



rn?m\ 



m— 1 a— 1 



d=l 



X 



n E 



sin2(A;7r/2) 



+ 5d,i cos(7r/m) 
sin^(A;7r/a) 



a=m—d+l k- 



^ cos(7r/m) + cos(/c7r/a) 1 — 5a,m-d+i cos(/c7r/a 



■.(3.61) 



As the value of Cm.m-i is given by an alternating sum, this coefficient is much 
smaller than Cm,i- Additionally, it is exactly zero for odd values of m because of 
the factor cos^(m7r/2). The other coefficient Cm,i, however, can be approximated 
by 



Cm,i ~ 0.324 m 



1 — cos ( — ) 



(3.62) 



for m ^ 1, which is clearly visible in figure EUSl too. 

In the case of neural synchronization, m = 2L + 1 is always odd, so that 
Cm,m-i = 0. Here P(T = t) asymptotically converges to a geometric probability 
distribution for long synchronization times: 



P(r = t) ~ Cm,l 



COS ( — ) 

m/ 



t-i 



(3.63) 



Figure 13.141 shows that this analytical solution describes P{T = t) well, except 
for some deviations at the beginning of the synchronization process. But for 
small values of t one can use the equations of motion for pa^b in order to calculate 
P(T = t) iteratively. 
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Figure 3.15: Average synchronization time (T^) as a function of N for m = 7 
{L = 3). Results of the numerical calculation using (13.641) are represented by 
circles. The dashed line shows the expectation value of T/v calculated in fl3.69p . 

3.3.4 Extreme order statistics 

In this section the model is extended to independent pairs of random walks 
driven by identical random noise. This corresponds to two hidden units with 
weights, which start uncorrelated and reach full synchronization after Tjv attrac- 
tive steps. 

Although (T) is the mean value of the synchronization time for a pair of 
weights, w:^j and wfj, it is not equal to (T/v). The reason is that the weight vectors 
have to be completely identical in the case of full synchronization. Therefore T/v 
is the maximum value of T observed in A^ independent samples corresponding to 
the different weights of a hidden unit. 

As the distribution function P(T < t) is known, the probability distribution 
of T/v is given by 

P{Tn <t) = P(T < t)^ . (3.64) 

Hence one can calculate the average value (T/v) using the numerically com- 
puted distribution P(T/v < t). The result, which is shown in figure [3TT51 indicates 
that (T/v) increases logarithmically with the number of pairs of random walkers: 

(T/v) - (T) oclnAT. (3.65) 

For large A^ only the asymptotic behavior of P(T < t) is relevant for the 
distribution of Tjy. The exponential decay of P(T = t) according to (I3.59P yields 
a Gumbel distribution for P(T/v < t) [33| . 

G{t) = exp f-e^) , (3.66) 
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for N ^ m with the parameters 



ta = tbln- — ^'^7'\ N ^"^^ = -1 ] , \ ■ (3-67) 

1 — cos(7r/mj mcos(7r/mj 



Substituting (1XU7I) into dSSHD yields [2 

p/T. A^c„,iCOS*(7r/m)\ 

P(7iv < t) = exp (3.68) 

\ 1 — cos(7r/m) / 

as the distribution function for the total synchronization time of pairs of ran- 
dom walks (N 3> m). The expectation value of this probability distribution is 
given by ^] 

(r^)=ta + tb7 = -] ^— -f7 + lniV + ln- """"f ) . (3.69) 

incos(7r/mj \ 1 — cos(7r/mjy 

Here 7 denotes the Euler-Mascheroni constant. For N ^ m ^ 1 the asymptotic 
behavior of the synchronization time is given by 

(T^) ^ A f 7 + In AT + In ^Hlpl] . (3.7O) 



Using (13.621) finally leads to the result [27 

(Tn) ^ ^m'^ (In N + ln( 0.577m)) , (3.71) 



which shows that (T/v) increases proportional to In A^. 

Of course, neural synchronization is somewhat more complex than this model 
using random walks driven by pairwise identical noise. Because of the structure 
of the learning rules the weights are not changed in each step. Including these idle 
steps certainly increases the synchronization time tsync- Additionally, repulsive 
steps destroying synchronization are possible, too. Nevertheless, a similar scaling 
law (tsync) oc In A^ can be observed for the synchronization of two Tree Parity 
Machines as long as repulsive effects have only little influence on the dynamics 
of the system. 



3.4 Random walk of the overlap 

The most important order parameter of the synchronization process is the over- 
lap between the weight vectors of the participating neural networks. The results 
of section 13.11 and section 13.21 indicate that its change over time can be described 
by a random walk with position dependent step sizes, (Apa), (Apr), and tran- 



sition probabilities. Pa, Pi [31[. Of course, only the transition probabilities are 
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Figure 3.16: Average change of the overlap for = 3, L = 5, and random walk 
learning rule. Symbols denote results obtained from 1000 simulations, while the 
lines have been calculated using fl3.72p . 

exact functions of p, while the step sizes fluctuate randomly around their average 
values. Consequently, this model is not suitable for quantitative predictions, but 
nevertheless one can determine important properties regarding the qualitative 
behavior of the system. For this purpose, the average change of the overlap 



in one synchronization step as a function of p is especially useful. 

Figure [H. 1 61 clearly shows the difference between synchronization and learning 
for K = In the case of bidirectional interaction, (Ap) is always positive 
until the process reaches the absorbing state at p = 1. But for unidirectional 
interaction, there is a fixed point at pf < 1. That is why a further increase of 
the overlap is only possible by fluctuations. Consequently, there are two different 
types of dynamics, which play a role in the process of synchronization. 

3.4.1 Synchronization on average 

If (Ap) is always positive for p < 1, each update of the weights has an attractive 
effect on average. In this case repulsive steps delay the process of synchronization, 
but the dynamics is dominated by the effect of attractive steps. Therefore it is 
similar to that of random walks discussed in section 13. 3[ 

As shown in figure 13.171 the distribution of the overlap gets closer to the 
absorbing state at p = 1 in each time step. And the velocity of this process 



(Ap) = P,(p)(Apa(p)) + Pr(p)(Ap,(p)) 



(3.72) 
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Figure 3.17: Distribution of the overlap in different time steps. These results were 
obtained in 100 simulations for synchronization with i^T = 3, L = 5, iV = 100, 
and random walk learning rule. 



is determined by (Ap). That is why p increases fast at the beginning of the 
synchronization, but more slowly towards the end. 

However, the average change of the overlap depends on the synaptic depth L, 
too. While the transition probabilities Pa and are unaffected by a change of L, 
the step sizes (Apa) and ( Apr) shrink proportional to according to fl3.16p and 
fl3.18p . Hence (Ap) also decreases proportional to so that a large synaptic 
depth slows down the dynamics. That is why one expects 

(^sync) OC OC (3.73) 

for the scaling of the synchronization time. 

In fact, the probability P(tsync < t) to achieve identical weight vectors in 
A's and B's neural networks in at most t steps is described well by a Gumbel 
distribution fl3.66p : 

P|„,(t) = exp (-e"^) . (3.74) 

Similar to the model in section 13.31 the parameters ta and tb increase both pro- 
portional to L^, which is clearly visible in figure [3?T8l Consequently, the average 
synchronization time scales like (tgync) oc L^lnA^, in agreement with (I3.7ip . 

Additionally, figure 13.171 indicates that large fluctuations of the overlap can 
be observed during the process of neural synchronization. For t = the width of 
the distribution is due to the finite number of weights and vanishes in the limit 
N ^ oo. But later fluctuations are mainly amplifled by the interplay of discrete 
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Figure 3.18: Probability distribution of the synchronization time for two Tree 
Parity Machines with if = 3, L = 3, = 1000, and random walk learning rule. 
The histogram shows the relative frequency of occurrence observed in 10 000 
simulations and the thick curve represents a fit of the Gumbel distribution. Fit 
parameters for different values of L are shown in the inset. 



attractive and repulsive steps. This effect cannot be avoided by increasing A^, 
because this does not change the step sizes. Therefore the order parameter p is 
not a self-averaging quantity [S^]: one cannot replace p by (p) in the equations 
of motion in order to calculate the time evolution of the overlap analytically. 
Instead, the whole probability distribution of the weights has to be taken into 
account. 



3.4.2 Synchronization by fluctuations 

If there is a fixed point at pf < 1, then the dynamics of neural synchronization 
changes drastically. As long as p < pf the overlap increases on average. But then 
a quasi-stationary state is reached. Further synchronization is only possible by 
fluctuations, which are caused by the discrete nature of attractive and repulsive 
steps. 

Figure 13.191 shows both the initial transient and the quasi-stationary state. 
The latter can be described by a normal distribution with average value pf and 
a standard deviation af. 

In order to determine the scaling of the fluctuations, a linear approximation 



of (Ap(p)) is used as a simple model |31 



Ap(t) = -af(p(t)-pf)+M(t), 



(3.75) 
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Figure 3.19: Distribution of the overlap in different time steps. These results 
were obtained in 100 simulations for the geometric attack with i^T = 3, L = 5, 
= 100, and random walk learning rule. 

without taking the boundary conditions into account. Here the ^{t) are random 
numbers with zero mean and unit variance. The two parameters are defined as 



dp 



(3.76) 



P=PC 



f3, = V((Ap(Pf))') • (3.77) 
In this model, the solution of fl3.75p . 

t 

p{t + l)-p, = (3, ^(1 - a,y-'a^) , (3.78) 

describes the time evolution of the overlap. Here the initial condition p(0) = pf 
was assumed, which is admittedly irrelevant in the limit t oo. Calculating the 
variance of the overlap in the stationary state yields 31 



c^f^ = /5f^E(i-"f)" = ^;r^- (3-79) 

^ — ^ Zar — at 

As the step sizes of the random walk in p-space decrease proportional to 
for L 1 according to (13.161) and (I3.18p . this is also the scaling behavior of the 
parameters and /3f. Thus one finds 

1 

cTf oc - (3.80) 

1j 
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Figure 3.20: Standard deviation of p at tlie fixed point for i^" = 3, iV = 1000, 
random walk learning rule, and unidirectional synchronization, averaged over 
10 000 simulations. The inset shows the position of the fixed point. 



for larger values of the synaptic depth. Although this simple model does not 
include the more complex features of (Ap(p)), its scaling behavior is clearly re- 
produced in figure I3.20[ Deviations for small values of L are caused by finite-size 
effects. 

Consequently, E is unable to synchronize with A and B in the limit L oo, 
even if she uses the geometric attack. This is also true for any other algorithm 
resulting in a dynamics of the overlap, which has a fixed point at pf < 1. 

For finite synaptic depth, however, the attacker has a chance of getting beyond 
the fixed point at pf by fluctuations. The probability that this event occurs in any 
given step is independent of t, once the quasi- stationary state has been reached. 
Thus -Psync('^) givcu by a Gumbel distribution f l3.66p . but described well for 

t ^ to by an exponential distribution. 



sync 



{t) = l-e 



(3.81) 



with time constant tf . This is clearly visible in figure 13.211 Because of t{ ^ to 
one needs 

/ tfcWif forto<0 1 _ 
^tsync;~| tf + to forto>0 j^'' 

1 using unidirectional learning. 



(3.82) 



steps on average to reach p 

In the simplified model 3l| with linear (Ap(p)) the mean time needed to 
achieve full synchronization starting at the fixed point is given by 



tf 



1 



Pip 



27r(Tfe 



(3.83) 
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Figure 3.21: Probability distribution of tsync for K = 3, N = 1000, random walk 
learning rule, and geometric attack. Symbols denote results averaged over 1000 
simulations and the lines show fits with (13.811) . 




Figure 3.22: Time constant tf for synchronization by fluctuations. Symbols de- 
note results obtained in 1000 simulations of the geometric attack for K = 3, 
N = 1000, and random walk learning rule. The line shows a flt with f l3.85p . 
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as long as the fluctuations are small. If (Xf ^ 1 — Pf, the assumption is reasonable, 
that the distribution of p is not influenced by the presence of the absorbing state 
at p = 1. Hence one expects 

tf oc e"-^' (3.84) 

for the scaling of the time constant, as cxf changes proportional to , while pf 
stays nearly constant. And figure 1X221 shows that indeed U grows exponentially 
with increasing synaptic depth: 

ti oc e'^^^'^^" . (3.85) 

Thus the partners A and B can control the complexity of attacks on the neural 
key-exchange protocol by choosing L. Or if E's effort stays constant, her success 
probability drops exponentially with increasing synaptic depth. As shown in 



chapter HI this effect can be observed in the case of the geometric attack [l6| and 



even for advanced methods 22, 23 



3.5 Synchronization time 

As shown before the scaling of the average synchronization time (tsync) with re- 
gard to the synaptic depth L depends on the function {Ap{p)) which is different 
for bidirectional and unidirectional interaction. However, one has to consider 
two other parameters. The probability of repulsive steps P^. depends not only 
on the interaction, but also on the number of hidden units. Therefore one can 
switch between synchronization on average and synchronization by fluctuations 
by changing K, which is the topic of section [3.5.1l Additionally, the chosen learn- 
ing rule influences the step sizes of attractive and repulsive steps. Section 13.5.21 
shows that this affects (Ap(p)) and consequently the average synchronization 
time (tsync), too. 

3.5.1 Number of hidden units 

As long as i^' < 3, A and B are able to synchronize on average. In this case (tgync) 
increases proportional to L^. In contrast, E can only synchronize by fluctuations 
as soon as > 1, so that for her (tsync) grows exponentially with the synaptic 
depth L. Consequently, A and B can reach any desired level of security by 
choosing a suitable value for L. 

However, this is not true for > 3. As shown in figure 13^231 a fixed point at 
Pf < 1 appears in the case of bidirectional synchronization, too. Therefore (13.731) 
is not valid any more and (tgync) now increases exponentially with L. This is 
clearly visible in figure 13.241 Consequently, Tree Parity Machines with four and 
more hidden units cannot be used in the neural key-exchange protocol, except if 
the synaptic depth is very small. 
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Figure 3.23: Average change of the overlap for L = 10, iV = 1000, random walk 
learning rule, and bidirectional synchronization. Symbols denote results obtained 
from 100 simulations, while the lines have been calculated using fl3.72p . 




Figure 3.24: Synchronization time for bidirectional interaction, N = 1000, and 
random walk learning rule. Symbols denote results averaged over 10 000 simula- 
tions and the lines represent fits of the model (tsync) oc L^. 
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Figure 3.25: Synchronization time for bidirectional interaction, N 
random walk learning rule, averaged over 1000 simulations. 



1000, and 



Figure [H.25I shows the transition between the two mechanisms of synchroniza- 
tion clearly. As long as K < 3 the scaling law (tsync) oc is valid, so that 
the constant of proportionality (tsync)/-^^^ is independent of the number of hidden 
units. Additionally, it increases proportional to Ini^A^, as the total number of 
weights in a Tree Parity Machine is given by KN. 

In contrast, (tsync) oc is not valid for K > 3. In this case (tsync)/-^^ still 
increases proportional to Ini^A^, but the steepness of the curve depends on the 
synaptic depth, as the fluctuations of the overlap decrease proportional to L~^. 
Consequently, there are two sets of parameters, which allow for synchronization 
using bidirectional interaction in a reasonable number of steps: the absorbing 
state p = 1 is reached on average for K < 3, whereas large enough fluctuations 
drive the process of synchronization in the case of L < 3 and K > 4. Otherwise, 
a huge number of steps is needed to achieve full synchronization. 



3.5.2 Learning rules 

Although the qualitative properties of neural synchronization are independent of 
the chosen learning rule, one can observe quantitative deviations for Hebbian and 
anti-Hebbian learning in terms of flnite-size effects. Of course, these disappear in 
the limit L/Vn 0. 

As shown in section [HTTl Hebbian learning enhances the effects of both repulsive 
and attractive steps. This results in a decrease of (Ap) for small overlap, where 
a lot of repulsive steps occur. But if A's and B's Tree Parity Machines are nearly 
synchronized, attractive steps prevail, so that the average change of the overlap 
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Figure 3.26: Average change of the overlap for K = 3, L = 10, N = 1000, 
and bidirectional synchronization. Symbols denote results obtained from 100 
simulations, while the lines have been calculated using (13.721) . 
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Figure 3.27: Synchronization time for bidirectional interaction and K = 3. Sym- 
bols denote results averaged over 10 000 simulations and the line shows the cor- 
responding fit from figure 13.241 for the random walk learning rule. 
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is increased compared to the random walk learning rule. This is clearly visible 
in figure I3.26[ Of course, anti-Hebbian learning reduces both step sizes and one 
observes the opposite effect. 

However, the average synchronization time (tgync) is mainly influenced by the 
transition from p ^ 1 to p = 1, which is the slowest part of the synchroniza- 
tion process. Therefore Hebbian learning decreases the average number of steps 
needed to achieve full synchronization. This effect is clearly visible in figure 13^271 

In contrast, anti-Hebbian learning increases (tsync)- Here finite-size effects 
cause problems for bidirectional synchronization, because one can even observe 
(Ap) < for A' = 3, if L/\/N is just sufficiently large. Then the synchronization 
time increases faster than L^. Consequently, this learning rule is only usable 
in large systems, where finite-size effects are small and the observed behavior is 
similar to that of the random walk learning rule. 



Chapter 4 

Security of neural cryptography 



The security of the neural key-exchange protocol is based on the phenomenon 
analyzed in chapter [21 two Tree Parity Machines interacting with each other 
synchronize much faster than a third neural network trained using their inputs 
and outputs as examples. In fact, the effort of the partners grows only polyno- 
mially with increasing synaptic depth, while the complexity of an attack scales 
exponentially with L. 

However, neural synchronization is a stochastic process driven by random 



attractive and repulsive forces [19|. Therefore A and B are not always faster 
than E, but there is a small probability Pe that an attacker is successful before 
the partners have finished the key exchange. Because of the different dynamics 
Pe drops exponentially with increasing L, so that the system is secure in the 



limit L ^ oc [16[. And in practise, one can reach any desired level of security 
by j ust increasing L, while the effort of generating a key only grows moderately 

3- 

Although this mechanism works perfectly, if the parameters of t he p rotocol are 



chosen correctly, other values can lead to an insecure key exchange [2]|. Therefore 
it is necessary to determine the scaling of Pe for different configurations and all 
known attack methods. By doing so, one can form an estimate regarding the 
minimum synaptic depth needed for some practical applications, too. 

While Pe directly shows whether neural cryptography is secure, it does not 
reveal the cause of this observation. For that purpose, it is useful to analyze 
the mutual information / gained by partners and attackers during the process 
of synchronization. Even though all participants receive the same messages, A 
and B can select the most useful ones for adjusting the weights. That is why 
they learn more about each other than E, who is only listening. Consequently, 
bidirectional interaction gives an advantage to the partners, which cannot be 
exploited by a passive attacker. 

Of course, E could try other methods instead of learning by listening. Es- 
pecially in the case of a brute-force attack, security depends on the number 
of possible keys, which can be generated by the neural key-exchange protocol. 
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Therefore it is important to analyze the scahng of this quantity, too. 



4.1 Success probability 

Attacks which are based on learning by listening have in common that the op- 
ponent E tries to synchronize one or more Tree Parity Machines with A's and 
B's neural networks. Of course, after the partners have reached identical weight 
vectors, they stop the process of synchronization, so that the number of available 
examples for the attack is limited. Therefore E's online learning is only successful, 
if she discovers the key before A and B finish the key exchange. 

As synchronization of neural networks is a stochastic process, there is a small 
probability that E synchronizes faster with A than B. In actual fact, one could 
use this quantity directly to describe the security of the key-exchange protocol. 
However, the partners may not detect full synchronization immediately, so that 
E is even successful, if she achieves her goal shortly afterwards. Therefore Pe 
is defined as the probability that the attacker knows 98 per cent of the weights 
at synchronization time. Additionally, this definition reduces fluctuations in the 



simulations, which are employed to determine Pe 16 



4.1.1 Attacks using a single neural network 

For both the simple attack and the geometric attack E only needs one Tree Parity 
Machine. So the complexity of these methods is small. But as already shown in 
section [3.4.21 E can only synchronize by fluctuations if 7^ > 1, while the partners 
synchronize on average as long as i^' < 3. That is why t^^^^ is usually much larger 
than t^jj^ for i^' = 2 and K = 3. In fact, the probability of tf^^^ < t^^^^ in this 
case is given by 



t=o 



-^(^sync — '''sync) ~ / -^syncl^) ^^-^sync(^) '^^ i^-^) 



under the assumption that the two synchronization times are uncorrelated ran- 
dom variables. In this equation P^^^^i't) -fsync(^) the cumulative prob- 
ability distributions of the synchronization time defined in (13.741) and (I3.8ip . 
respectively. 

In order to approximate this probability one has to look especially at the 
fluctuations of the synchronization times t^^c ^^nc- The width of the Gumbel 
distribution, 

(tfync)^ = ^tb, (4.2) 

for A and B is much smaller than the standard deviation of the exponential 
distribution, 

^((^fync)') - (^s^ync)' = U , (4-3) 
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Figure 4.1: Success probability of the geometric attack as a function of L. Sym- 
bols denote results obtained in 10 000 simulations with N = 1000 and random 
walk learning rule, while the lines represent fit results for model fl4.7l) . 

for E because of tf ^ tb- Therefore one can approximate -Pgync(^) integral (14.11) 
by e(t - {tfy^c)), which leads to 



Pit 



sync — sync/ 



exp 



(■^sync) 

(fE \ 
N^sync/ 



(4.4) 



Hence the success probability of an attack depends on the ratio of both average 
synchronization times, 

(^sync) 



oc 



(fE \ ^ pCiL+C2L2 
\''sync/ ^ 



(4.5) 



which are functions of the synaptic depth L according to (13.731) and (I3.85p . Con- 
sequently, L is the most important parameter for the security of the neural key- 
exchange protocol. 

In the case of L 1 the ratio (i£nc)/(''^^nc) becomes very small, so that a 



further approximation of (14.41) is possible. This yields the result 

^(Cc < Cc) oc L^e^-^e--^^ , 



(4.6) 



which describes the asymptotic behavior of the success probability: if A and B 
increase the synaptic depth of their Tree Parity Machines, the success probability 
of an attack drops exponentially [16|. Thus the partners can achieve any desired 
level of security by changing L. 
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Figure 4.2: Success probability Pe of the geometric attack as a function of K. 
Symbols denote results obtained in 1000 simulations using the random walk learn- 
ing rule and N = 1000. 



Although Pe is not exactly identical to Pitfy^c ^ ^^nc) because of its defini- 
tion, it has the expected scaling behavior, 

Pe oc e^y^^-y^^' , (4.7) 

which is clearly visible in figure 14. 1[ However, the coefficients yi and y2 are 
different from ci and C2 due to interfering correlations between t^^^, and t^nc 
which have been neglected in the derivation of P{tfy^^ < t^y^J. 

Additionally, figure 14.11 shows that the success probability of the geometric 
attack depends not only on the synaptic depth L, but also on the number of 
hidden units K. This effect, which results in different values of the coefficients, 
is caused by a limitation of the algorithm: the output of at most one hidden unit 
is corrected in each step. While this is sufficient to avoid all repulsive steps in 
the case K = 1, there can be several hidden units with af ^ of for K > 1. And 
the probability for this event grows with increasing K, so that more and more 
repulsive steps occur in E's neural network. 

Consequently, A and B can achieve security against this attack not only by 
increasing the synaptic depth L, but also by using a greater number of hidden 
units K. Of course, for i^' > 3 large values of L are not possible, as the process of 
synchronization is then driven by fluctuations. Nevertheless, figure IT2] shows that 
the success probability Pe for the geometric attack drops quickly with increasing 
K even in the case L = 1. 

As the geometric attack is an element of both advanced attacks, majority 
attack and genetic attack, one can also defeat these methods by increasing K. 
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Figure 4.3: Success probability Pe of the simple attack as a function of K. Sym- 
bols denote results obtained in 1000 simulations using the random walk learning 
rule and = 1000. 



But then synchronization by mutual interaction and learning by listening become 
more and more similar. Thus one has to look at the success probability of the 
simple attack, too. 

As this method does not correct the outputs af of the hidden units at all, 
the distance between the fixed point at pf < 1 of the dynamics and the absorbing 
state at p = 1 is greater than in the case of the geometric attack. That is why 
a simple attacker needs larger fluctuations to synchronize and is less successful 
than the more advanced attack as long as the number of hidden units is small. 

In principle, scaling law (14.71) is also valid for this method. But one cannot 
find a single successful simple attack in 1000 simulations using the parameters 



K = 3 and L = 3 13| . This is clearly visible in figure 14.31 Consequently, the 
simple attack is not sufficient to break the security of the neural key-exchange 
protocol for < 3. 

But learning by listening without any correction works if the number of hid- 
den units is large. Here the probability of repulsive steps is similar for both 
bidirectional and unidirectional interaction as shown in section 13. 2[ That is why 
Pe approaches a non-zero constant value in the limit K ^ oo. 

These results show that i^' = 3 is the optimal choice for the cryptographic 
application of neural synchronization. K = 1 and K = 2 are too insecure in 
regard to the geometric attack. And for K > 3 the effort of A and B grows 
exponentially with increasing L, while the simple attack is quite successful in the 
limit K ^ oo. Consequently, one should only use Tree Parity Machines with 
three hidden units for the neural key-exchange protocol. 
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Figure 4.4: Success probability of the genetic attack. Symbols denote results 
obtained in 1000 simulations with = 3, = 1000, and random walk learning 
rule. The lines show fit results using ( 14.81) as a model. 



4.1.2 Genetic attack 

In the case of the genetic attack E's success depends mainly on the ability to 
determine the fitness of her neural networks. Of course, the best quantity for 
this purpose would be the overlap p^^ between an attacking network and A's 
Tree Parity Machine. However, it is not available, as E only knows the weight 
vectors of her own networks. Instead the attacker uses the frequency of the event 

= in recent steps, which gives a rough estimate of p^^. 

Therefore a selection step only works correctly, if there are clear differences 
between attractive and repulsive effects. As the step sizes (Apa) and (Apr) de- 
crease proportional to distinguishing both step types becomes more and 
more complicated for E. Thus one expects a similar asymptotic behavior of Pe 
in the limit L — >■ oo as observed before. 

Figure 14.41 shows that this is indeed the case. The success probability drops 
exponentially with increasing synaptic depth L, 

Pe ~ 6-2^(^-^0) , (4.8) 



as long as L > Lo [22]. But for L < Lq E is nearly always successful. Conse- 
quently, A and B have to use Tree Parity Machines with large synaptic depth in 
order to secure the key-exchange protocol against this attack. 

In contrast to the geometric method, E is able to improve her success probabil- 
ity by increasing the maximum number of networks used for the genetic attack. 
As shown in figure 14.51 this changes Lq, but the coefficient y remains approxi- 
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Figure 4.5: Coefficients Lq and y for the genetic attack as a function of the 
number of attackers M. Symbols denote the results obtained in figure 14.41 for 
K = 3, N = 1000, and random walk learning rule. 



mately constant. However, it is a logarithmic effect: 

Lo{M) =Lo{l) + LElnM. (4.9) 

That is why the attacker has to increase the number of her Tree Parity Machines 
exponentially, 

M oc e^l^^ (4.10) 

in order to compensate a change of L and maintain a constant success probabil- 
ity Pe- But the effort needed to generate a key only increases proportional to 
L^. Consequently, the neural key-exchange protocol is secure against the genetic 
attack in the limit L — > cxd. 



4.1.3 Majority attack 

An opponent who has an ensemble of M Tree Parity Machines, can also use the 
majority attack to improve Pe- This method does neither generate variants nor 
select the fittest networks, so that their number remains constant throughout 
the process of synchronization. Instead the majority decision of the M Tree 
Parity Machines determines the internal representation (af , . . . , cr^) used for the 
purpose of learning. Therefore this algorithm implements, in fact, the optimal 



Bayes learning rule [28|, l35|, [36 



In order to describe the state of the ensemble one can use two order parame- 
ters. First, the mean value of the overlap between corresponding hidden units in 
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A's and E's neural networks, 



p^^ = ^T (4 11) 



m=l 



indicates the level of synchronization. Here the index m denotes the m-th at- 
tacking network. Similar to other attacks, E starts without knowledge in regard 
to A, so that p^^ = at the beginning. And finally she is successful, if p^^ = 1 
is reached. 

Second, the average overlap between two attacking networks. 



M E,m E,n 

EE - ■ W, 



M(M- 



|Wi II ||w, 



describes the correlations in E's ensemble. At the beginning of the synchroniza- 
tion p^^ = 0, because all weights are initialized randomly and uncorrelated. But 
as soon as p^^ = 1 is reached, E's networks are identical, so that the performance 
of the majority attack is reduced to that of the geometric method. 

In the large M limit, the majority vote of the ensemble is identical to the 
output values cxj of a single neural network, which is located at the center of mass 



35l | . Its weight vectors are given by the normalized average over all of E's Tree 
Parity Machines: 

1 M E,m 
E,cm ^ I ^ W. 

M 



-^^^""= mE^-W- (4-13) 



m=l 



W- 



The normalization of wf^'"* corresponds to the fact that each member has exactly 
one vote. Using (14.111) and (14.121) one can calculate the overlap between the 
center of mass and A's tree Parity Machine: 

A E,cm AE 
W ■ W- O 

pr = II 'i^cmii = ; = ■ (4-14) 

IK^IIIIw- II y/pr+-^(l-pP) 
Consequently, the effective overlap between A and E is given by 

in the limit M — * oo. This result is important for the dynamics of the synchro- 
nization process between A and E, because p^™ replaces pf^ in the calculation of 
the transition probabilities -Pa(p) and -Pr(p), whenever the majority vote is used to 
adjust the weights. But the step sizes (Apa(p)) and (Apr(p)) are not affected by 
this modification of the algorithm. Therefore the average change of the overlap 
between A and E is given by 

(Apf^) = Pa(pr)(APa(pr)) + PripDi^Pripn) , (4.16) 



4.1 Success probability 



63 



/ / 




/ / / 

' '/ 


AB 

— p 


-' /' 


AE 




P 




EE 




-- p 




cm 




- p 



200 400 600 800 1000 

t 



Figure 4.6: Process of synchronization in the case of a majority attack with 
M = 100 attacking networks, averaged over 1000 simulations for K = 3, L = 5, 
N = 1000, and random walk learning rule. 



if the majority vote is used for updating the weight vectors. Although this equa- 
tion is strictly correct only in the limit M — oo, M = 100 is already sufficient 



for the majority attack [21 



However, as all attacking networks learn the same internal representation, the 
internal overlap pf^ is increased by the resulting attractive effect: 

(ApD = I {Ap.^{pn) . (4.17) 

Hence pf ^ grows faster than pf^ in these steps, so that the advantage of the 



majority vote decreases whenever it is used [37 . 

This is clearly visible in figure 14. 6[ In the first 100 steps the attacker only 
uses the geometric attack. Here p^^ ~ p"^^, which can also be observed for an 



ensemble of perceptrons learning according to the Bayes rule [28j. At t = 100, 
using the majority vote gives E a huge advantage compared to the geometric 
attack, because p'^™ ^ a/ p^^ > p^^, so that the probability of repulsive steps is 
reduced. Therefore the attacker is able to maintain p'^^ ^ p^^ for some time. 
Later p^^ increases and this benefit vanishes. 

However, the attacker is unable to reach full synchronization on average. As 
shown in figure 14.71 there is still a fixed point at pf < 1 in the case of the 
majority attack, although its distance to the absorbing state is smaller than for 
the geometric attack. Consequently, one expects a higher success probability Pe, 
but similar scaling behavior. 

Figure 14.81 shows that this is indeed the case for the random walk learning 
rule and for anti-Hebbian learning. But if A and B use the Hebbian learning 
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Figure 4.8: Success probability Pe of the majority attack with M = 100 attacking 
networks for K = 3 and = 1000. Symbols denote results obtained in 10 000 
simulations and the line shows the corresponding fit from figure 14.91 
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Figure 4.9: Success probability of different attacks as a function of the synaptic 
depth L. Symbols denote results obtained in 1000 simulations using the random 
walk learning rule, = 3, and N = 1000, while the lines show fit results using 
model fl4.8p . The number of attacking networks is M = 4096 for the genetic 
attack and M = 100 for the majority attack. 



rule instead, Pe reaches a constant non-zero value in the limit L — > cxj |21 
Apparently, the change of the weight distribution caused by Hebbian learning is 
enough to break the security of the neural key-exchange protocol. Consequently, 
A and B cannot use this learning rule for cryptographic purposes. 

While anti-Hebbian learning is secure against the majority attack, a lot of 
finite size effects occur in smaller systems, which do not fulfill the condition 
L <^ \/N. In this case (tgync) increases faster than as shown in section 13.51 
Fortunately, A and B can avoid this problem by just using the random walk 
learning rule. 



4.1.4 Comparison of the attacks 

As E knows the parameters of the neural key-exchange protocol, she is able to 
select the best method for an attack. Consequently, one has to compare the 
available attacks in order to determine the maximum of Pe- 

Figure 14.91 shows the result. Here (14. 8 p has been used as fit model even for 
the geometric attack, which is a special case of both advanced attacks for M = 1. 
Of course, by doing so the curvature visible in figure 14.11 is neglected, so that 
extrapolating Pe for L — > oo overestimates the efficiency of this method. 

All three attacks have similar scaling behavior, but the coefficients Lq and 
y obtained by fitting with (14. 8 p depend on the chosen method. The geometric 
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attack is the simplest method considered in figure 14.91 Therefore its success 
probabihty is lower than for the more advanced methods. As the exponent y is 
large, A and B can easily secure the key-exchange protocol against this method 
by just increasing L. 

In the case of the majority attack, Pe is higher, because the cooperation 
between the attacking networks reduces the coefficient y. A and B have to com- 
pensate this by further stepping up L. In contrast, the genetic attack merely 
increases Lq, but y does not change significantly compared to the geometric at- 
tack. Therefore the genetic attack is only better if L is not too large. Otherwise 



E gains most by using the majority attack 22 



While A and B can reach any desired level of security by increasing the synap- 
tic depth, this is difficult to realize in practise. Extrapolation of fl4.8l) shows 
clearly that Pe ~ 10~^ is achieved for K = 3, L = 57, = 1000, and random 
walk learning rule. But the average synchronization time (tsync) ~ 1.6 ■ 10^ is 
quite large in this case. Consequently, it is reasonable to develop an improved 



neural key-exchange protocol 19|, l23|, l2J] , which is the topic of chapter [5l 



4.2 Security by interaction 

The main difference between the partners and the attacker in neural cryptography 
is that A and B are able to influence each other by communicating their output 
bits and r^, while E can only listen to these messages. Of course, A and 
B use their advantage to select suitable input vectors for adjusting the weights. 
As shown in chapter [3] this flnally leads to different synchronization times for 
partners and attackers. 

However, there are more effects, which show that the two-way communication 
between A and B makes attacking the neural key-exchange protocol more difficult 
than simple learning of examples. These confirm that the security of neural 
cryptography is based on the bidirectional interaction of the partners. 

4.2.1 Version space 

The time series of pairs {t^, t^) of output bits produced by two interacting Tree 
Parity Machines depends not only on the sequence of input vectors Xj(t), but 
also on the initial weight vectors w^^^(O) of both neural networks. Of course, 
E can reproduce the time series T^{t) exactly, if she uses a third Tree Parity 
Machine with wf (0) = (0), because the learning rules are deterministic. But 
choosing other initial values for the weights in E's neural network may lead to 
the same sequence of output bits. The set of initial configurations with this 
property is called version space [28|]. Its size tt-vs is a monotonically decreasing 
function of the length t of the sequence, because each new element Xj(i:) imposes 
an additional condition on wf'(O). Of course, it does not matter whether E uses 
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Figure 4.10: Version space of interacting Tree Parity Machines with K = 3, 
L = 1, N = 2, and random walk learning rule, averaged over 1000 simulations. 



the simple attack or the geometric attack, as both algorithms are identical under 
the condition T^(t) = T^(t). 

Figure 14.101 shows that the size of the version space shrinks by a factor 1 /2 in 
each step at the beginning of the time series. Here the possible configurations of 
the weights are still uniformly distributed, so that each output gives one bit 
of information about the initial conditions. 

But later neural networks which have started with similar weight vectors 
synchronize. That is why the configurations are no longer uniformly distributed 
and the shrinking of the n^s becomes slower and slower. Finally, all Tree Parity 
Machines in the version space have synchronized with each other. From then on 
TZvs stays constant. 

However, the size of the version space in the limit t ^ oo depends on the 
properties of the time series. If A and B start fully synchronized, they do not 
need to infiuence each other and all input vectors in the sequence are used to 
update the weights. In this case E has to learn randomly chosen examples of a 
time-dependent rule j6|. In contrast, if A and B start with uncorrelated weight 
vectors, they select a part of the input sequence for adjusting their weights. For 
them this interaction increases the speed of the synchronization process, because 
only suitable input vectors remain. But it also decreases n^sit — > oo), so that 
imitating B is more difficult for E. 

Consequently, the two-way communication between the partners gives them 
an advantage, which cannot be exploited by a passive attacker. Therefore bidi- 
rectional interaction is important for the security of the neural key-exchange 
protocol. 
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Figure 4.11: Mutual information between partners and attackers. Symbols denote 
simulation results for if = 3, L = 5, = 1000, and random walk learning rule, 
while the lines show the results of corresponding iterative calculations for N ^ oo. 

4.2.2 Mutual information 

Instead of using the overlap p as order parameter, which is closely related to the 
dynamics of the neural networks and the theory of learning, one can look at the 
process of synchronization from the point of view of information theory, too. For 
this purpose, the mutual information I^^{t) defined in f l2.18p describes A's and 
B's knowledge about each other. Similarly I^^{t) measures, how much informa- 
tion E has gained in regard to A at time t by listening to the communication of 
the partners. 

All participants start with zero knowledge about each other, so that I^^ = 
and I^^ = at the beginning of the key exchange. In each step there are several 
sources of information. The input vectors Xj(t) determine, in which directions 
the weights are moved, if the learning rule is applied. And, together with the 
outputs T^(t) and T^(t), they form an example, which gives some information 
about the current weight vectors in A's and B's Tree Parity Machines. Although 
all participants have access to Xi(t), r^(t), and T^{t), the increase of the mutual 
information / depends on the algorithm used to adjust the weights. 

This is clearly visible in figure I4.11[ While the partners reach full synchro- 
nization with I^^ = So quickly, the attacker is much slower. And E performs 
better if she uses the geometric method instead of the simple attack. Of course, 
these observations correspond to those presented in chapter [3l 

While the differences between E's attack methods are caused by the learning 
algorithms, which transform the available information into more or less knowledge 
about A's and B's weights, this is not the only reason for I^^(t) < I^^{t). In 
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order to synchronize the partners have to agree on some weight vectors w^, which 
are, in fact, functions of the sequence Xj(t) and the initial conditions wf^^(O). 

So they already have some information, which they share during the process of 
synchronization. Therefore the partners gain more mutual information / from 
each message than an attacker, who has no prior knowledge about the outcome 
of the key exchange. Consequently, it is the bidirectional interaction which gives 
A and B an advantage over E. 



Although all attacks on the neural key-exchange protocol known up to now are 
based on the training of Tree Parity Machines with examples generated by the 
partners A and B, this is not a necessary condition. Instead of that the oppo- 
nent could try a brute-force attack. Of course, a success of this method is nearly 
impossible without additional information, as there are {2L + 1)^^ different con- 
figurations for the weights of a Tree Parity Machine. However, E could use some 
clever algorithm to determine which keys are generated with high probability for 
a given input sequence. Trying out these would be a feasible task as long as there 
are not too many. Consequently, a large number of keys is important for the 
security of neural cryptography, especially against brute-force attacks. 

4.3.1 Synchronization without interaction 

If the weights are chosen randomly, there are {2L + iy^^ possible configurations 
for a pair of Tree Parity Machines. But the neural key-exchange protocol can 
generate at most (2L+ 1)^^ different keys. Consequently, sets of different initial 
conditions exist, which lead to identical results. That is why synchronization 
even occurs without any interaction between neural networks besides a common 
sequence of input vectors. 

In order to analyze this effect the following system consisting of two pairs of 
Tree Parity Machines is used: 



All four neural networks receive the same sequence of input vectors Xj, but both 
pairs communicate their output bits only internally. Thus A and B as well as 
C and D synchronize using one of the available learning rules, while correlations 
caused by common inputs are visible in the overlap pf". Because of the symmetry 
in this system, pf^, pf^, and pf^ have the same properties as this quantity, so 
that it is sufficient to look at pf^ only. 
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(4.18) 
(4.19) 
(4.20) 
(4.21) 
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Of course, synchronization of networks which do not interact with each other, 
is much more difficult and takes a longer time than performing the normal key- 
exchange protocol. Thus full internal synchronization of the pairs usually happens 
well before A's and C's weight vectors become identical, so that pf^ = 1 and 
pf^ = 1 are assumed for the calculation of {^pf^ {pf^)). 

As before, both attractive and repulsive steps are possible. In the case of 
identical overlap between corresponding hidden units, random walk learning rule, 
and K > 1, the probability of these step types is given by: 




(4.22) 



(4.23) 



Here e denotes the generalization error defined in equation (13.201) in regard to 
p^'~^ . For K = 1 only attractive steps occur, so that Pa = 1? which is similar to 
the geometric attack. But in the case of iiT = 3, one finds 

P. = ^ [1 - 2(1 - e)e] , (4.24) 
P, = 2(1 -e)e. (4.25) 

As long as p^*" > the probability of repulsive steps is higher than P^^ = e for 
the simple attack. Consequently, one expects that the dynamics of p'^'" has a 
fixed point at pf-^ < pf < 1 and synchronization is only possible by fiuctuations. 

Figure 14.121 shows that this is indeed the case. As more repulsive steps occur, 
the probability for full synchronization here is much smaller than for a successful 
simple attack. In fact, large enough fiuctuations which lead from p^*-^ = to 
p"^*^ = 1 without interaction only occur in small systems. But the common input 
sequence causes correlations between wf and wp even for L ^ 1 and ^ 1. 

This is clearly visible in figure 14.131 However, if Hebbian or anti-Hebbian 
learning is used instead of the random walk learning rule, one observes a some- 
what different behavior: the fixed point of the dynamics for = 3 is located 
at Pf = 0. According to these two learning rules the weights of corresponding 
hidden units move in opposite directions, if both 7^ t'-' and af 7^ crp. The 
average step size of such an inverse attractive step is given by 



(Api(p)) = -(Ap,(-p)) 



(4.26) 



4.3 Number of keys 



71 



0.08 



0.04 



<Ap^^>0 



-0.04 



-0.08 



O 



O CP 



o 



CD 



o 



cn 



Cd 



o 



Cb 



o 



o 



o 



o 



o 



o Hebbian learning 
□ anti-Hebbian learning 
o random walk 



(a 



o 



o 



o 



ft 



ft o 



-0.5 





AC 



0.5 



Figure 4.12: Average change of the overlap between A and C for X = 3, L = 3, 
and N — 1000, obtained in 100 simulations with 100 pairs of neural networks. 
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Figure 4.13: Distribution of the overlap p^*" after 1000 steps for K = 3, L = 3, 
and N = 100, obtained in 100 simulations with 100 pairs of Tree Parity Machines. 
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While Pr is independent of the learning rule, one finds 

iK-l)/2 X 

= I H (\. (4.27) 

for Hebbian or anti-Hebbian learning and K > 1. If is odd, the effects 
of all types of steps cancel out exactly at p = 0, because (Apr(O)) = and 
Pi(0) = Pa(0). Otherwise, the two transition probabilities, Pa(0) for attractive 
steps and Pi(0) for inverse attractive steps, are only approximately equal. Thus 
one observes pf ~ independent of K, so that the correlations between A and C 
are smaller in this case than for the random walk learning rule. 

But if the initial overlap between A and C is already large, it is very likely 
that both pairs of Tree Parity Machines generate the same key regardless of the 
learning rule. Consequently, the number of keys rikey is smaller than the number 
of weight configurations nconf = (2/^ + 1)-^^ of a Tree Parity Machine. 



4.3.2 Effective key length 

In order to further analyze the correlations between A's and C's neural networks 
the entropy 

K 

S^c = Y^Sf (4.29) 

i=l 

of their weight distribution is used. Here Sf-" is the entropy of a single hidden 
unit defined in f l2.15p . Of course, one can assume here that the weights stay 
uniformly distributed during the process of synchronization, either because the 
system size is large {N ^ 1) or the random walk learning rule is used. Therefore 
the entropy of a single network is given by 5*0 = i^'A^ln(2L + 1). 

Consequently, S"^"" — Sq is the part of the total entropy, which describes the 
correlations caused by using a common input sequence. It is proportional to the 
effective length of the generated cryptographic key, 

cAC _ c 

key = , (4.30) 

which would be the average number of bits needed to represent it using both 
an optimal encoding without redundancy and the input sequence as additional 
knowledge. If the possible results of the neural key-exchange protocol are uni- 
formly distributed, each one can be represented by a number consisting of /key 
bits. In this case 

nkey = = e^^^-^o (4.31) 
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Figure 4.14: Entropy per weight for A and C for K = 3, L = 3, and random walk 
learning rule, obtained in 100 simulations with 100 pairs of neural networks. 

describes exactly the number of keys which can be generated using different 
initial weights for the same input sequence. Otherwise, the real number is larger, 
because mainly configurations, which occur with high probability, are relevant for 
the calculation of S^'" . However, an attacker is only interested in those prevalent 
keys. Therefore defined in equation (14.311) is, in fact, a lower bound for 

the number of cryptographic relevant configurations. 

Figure l¥. 141 shows the time evolution of the entropy. First S^'" shrinks linearly 
with increasing t, as the overlap p between A and C grows, while it approaches 
the stationary state. This behavior is consistent with an exponentially decreasing 
number of keys, which can be directly observed in very small systems as shown in 
figure 11113 Of course, after the system has reached the fixed point, the entropy 
stays constant. This minimum value of the entropy is then used to determine the 
effective number rikey of keys according to fl4.3ip . 

It is clearly visible that there are two scaling relations for S^'": 

• Entropy is an extensive quantity. Thus both S""^"" and 5*0 are proportional to 
the number of weights A^. Consequently, the number of keys, which can be 
generated by the neural key-exchange protocol for a given input sequence, 
grows exponentially with increasing system size A^. 

• The relevant time scale for all processes related to the synchronization of 
Tree Parity Machines is defined by the step sizes of attractive and repulsive 
steps which are asymptotically proportional to L~^. Therefore the time 
needed to reach the fixed point pf^ is proportional to L^, similar to (tsync)- 
In fact, it is even of the same order as the average synchronization time. 
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Figure 4.15: Number of keys for K = 3, L = 1, N = 2, and random walk 
learning rule, obtained by exhaustive search and averaged over 100 random input 
sequences. 



Instead of using the entropy directly, it is better to look at the mutual infor- 
mation I^'-^ = 2So — S^^ shared by A and C, which comes from the common 
input sequence and is visible in the correlations of the weight vectors. Using 



( l^rrgj) and fOTj) leads to [31 



/^^ = -lnf^^ . (4.32) 

V^conf/ 

Therefore the effective number of keys is given by 

nkey = neonf e"^"^ = (2L + l)^^e-^"^ . (4.33) 

As shown in figure 14.161 the mutual information I^^ at the end of the synchro- 
nization process becomes asymptotically independent of the synaptic depth in the 
limit L ^ oo. Consequently, the ratio nkey/'^-conf is constant except for finite-size 
effects occurring in small systems. 

The amount of correlations between A and C depends on the distribution of 
the overlap p"^*" in the steady state, which can be described by its average value 
Pi and its standard deviation o"f. As before, cxf decreases proportional to due 
to diminishing fluctuations in the limit L —>■ oo, while pf stays nearly constant. 
Hence /"^*" consists of two parts, one independent of L and one proportional to 
as shown in figure 14.171 

In the case of the random walk learning rule the mutual information increases 
with L, because fluctuations are reduced which just disturb the correlations cre- 
ated by the common sequence of input vectors. Extrapolating I^'-^ yields the 



4.3 Number of keys 



75 



u 
<; 



0.5 



0.4 



0.3 



0.2 



0.1 



' r 

o L=l 

□ L = 2 

o L = 3 

A L = 4 

< L = 5 



4>, 



4> ° 

□ 



□ □ (5 



o 



o 



□ 



4^ 

o 

3 O 
O 



O 



o 



o 



O 



o 



o 



20 



40 



60 



t/L 



C) 



©1 □ □ □ 



80 



100 



Figure 4.16: Mutual information between A and C for K = 3, N = 1000, and 
random walk learning rule, obtained in 1000 simulations with 10 pairs of neural 
networks. 
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been obtained in 1000 simulations with 10 pairs of neural networks. 
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result [31 

rikey ^ [0.66(2L+1)=']'^ , (4.34) 

which is vahd for = 3 and 1 <^ L <^ \fN . Consequently, nkey grows exponen- 
tially with N , so that there are always enough possible keys in larger systems to 
prevent successful brute-force attacks on the neural key-exchange protocol. 

Using Hebbian or anti-Hebbian learning, however, improves the situation fur- 
ther. Because of pf = one finds rikey ""^conf in the limit L ^ oo. Therefore 
the input sequence does not restrict the set of possible keys in very large systems 
using K = 1 <^ L <ti VN, and one of these two learning rules. 



4.4 Secret inputs 

The results of section 14.31 indicate that the input vectors are an important source 
of information for the attacker. Thus keeping Xj at least partially secret should 
improve the security of the neural key-exchange protocol. 



4.4.1 Feedback mechanism 

In order to reduce the amount of input vectors transmitted over the public chan- 
nel, the partners have to use an alternative source for the input bits. For that 
purpose they can modify the structure of the Tree Parity Machines, which is 
shown in figure 14.181 (l9| . 

n 




(J 



X 



Figure 4.18: A Tree Parity Machine with = 3, = 3, and feedback. 



Here the generation of the input values is different. Of course, A and B still 
start with a set of K randomly chosen public inputs Xj. But in the following time 
steps each input vector is shifted. 



(4.35) 
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Figure 4.19: Probability of synchronization for two Tree Parity Machines with 
feedback as a function of the initial overlap pstart- Symbols denote results obtained 
in 1000 simulations with i^T = 3, L = 3, and iV = 100. 



and the output bit ai of the corresponding hidden unit is used as the new first 
component, 



X. 



LI 



A/B 



(4.36) 



This feedback mechanism [38|] replaces the public sequence of random input vec- 
tors. Additionally, the anti-Hebbian learning rule fl2.7p is used to update the 
weights. By doing so one avoids the generation of trivial keys, which would be 



the result of the other learning rules 19[. Thus the hidden units of both Tree 



39 



Parity Machines work as confused hit generators 

However, synchronization is not possible without further information, as the 
bit sequence produced by such a neural network is unpredictable 
another one of the same type 



38, 40, 41 for 



19, 39 



This is clearly visible in figure 14.191 The 
reason is that the input vectors of A's and B's Tree Parity Machines become more 
and more different, because each occurrence of ^ erf reduces the number of 
identical input bits by one for the next steps. Of course, the partners disagree 
on the outputs cxj quite often at the beginning of the synchronization process, so 
that they soon have completely uncorrelated input vectors and mutual learning 
is no longer possible. 



4.4.2 Synchronization with feedback 

As the feedback mechanism destroys the common information about the inputs 
of the Tree Parity Machines, an additional mechanism is necessary for synchro- 
nization, which compensates this detrimental effect sufficiently. For that purpose 
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t 
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L 

Figure 4.20: Average synchronization time and its standard deviation for neural 
cryptography with feedback, obtained in 10 000 simulations with K = Z and 
N = 10 000. 



A and B occasionally reset the input vectors of their Tree Parity Machines if too 
many steps with 7^ occur. 



In fact, the following algorithm is used [19 



• If = r^, the weights are updated according to the anti-Hebbian learning 
rule fl2.7l) and the feedback mechanism is used to generate the next input. 

• If the output bits disagree, ^ , the input vectors are shifted, too, but 
all pairs of input bits xf(^ are set to common public random values. 

• After R steps with different output, 7^ , all inputs are reinitialized 
using a set of K randomly chosen public input vectors. 

Of course, setting R = leads to synchronization without feedback, while no 
synchronization is possible in the limit R 00. 

Figure 14.201 shows that using the feedback mechanism increases the average 
number of steps needed to achieve full synchronization. While there are strong 
finite size-effects, the scaling relation (tgync) oc is still valid for 1 ^ L <^ V^- 



Only the constant of proportionality is larger than before pJl . 

As shown in figure 14.211 a similar result can be observed in regard to the 
success probability of the geometric attack. As before, Pe drops exponentially 
with increasing synaptic depth, so that A and B can achieve any desired level 
of security by changing L. But with feedback smaller values of L are sufficient, 
because the factors yi and 1/2 in the scahng law (14.71) are larger. Therefore using 
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Figure 4.21: Success probability Pe of the geometric attack as a function of the 
synaptic depth L. Symbols denote results averaged over 10 000 simulations for 
K = 3 andN = 1000. 
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Figure 4.22: Success probability Pe of the geometric attack as a function of the 
average synchronization time (tgync)- Symbols denote results of 10 000 iterative 
calculations for K = 3 and N oo. Here successful synchronization has been 



defined as p > 0.9 [19 . 
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the feedback mechanism improves the security of neural cryptography by keeping 
input values partially secret. 

However, A and B usually want to keep their effort constant. Then one has to 
look at the function PB((tsync)) instead of Pe{L), which is plotted in figure 
for several values of the feedback parameter R. It is clearly visible that PE{{tsjnc)) 
does not depend much on R. Consequently, using feedback only yields a small 
improvement of security unless the partners accept an increase of the average 
synchronization time [l^j. 



4.4.3 Key exchange with authentication 

Synchronization of Tree Parity Machines by mutual learning only works if they 
receive a common sequence of input vectors. This effect can be used to implement 



an authentication mechanism for the neural key-exchange protocol [42, |43 . 

For that purpose each partner uses a separate, but identical pseudo-random 
number generator. As these devices are initialized with a secret seed state shared 
by A and B, they produce exactly the same sequence of bits, which is then used 
to generate the input vectors Xj needed during the synchronization process. By 
doing so A and B can synchronize their neural networks without transmitting 
input values over the public channel. 

Of course, an attacker does not know the secret seed state. Therefore E 
is unable to synchronize due to the lack of information about the input vectors. 
Even an active man-in-the-middle attack does not help in this situation, although 
it is always successful for public inputs. 

Consequently, reaching full synchronization proves that both participants 
know the secret seed state. Thus A and B can authenticate each other by per- 
forming this variant of the neural key exchange. As one cannot derive the secret 



from the public output bits, it is a zero-knowledge protocol [42 . 



Chapter 5 

Key exchange with queries 



The process of neural synchronization is driven by the sequence of input vectors, 
which are really used to adjust the weights of the Tree Parity Machines according 
to the learning rule. As these are selected by the partners participating in the 
key exchange, A and B have an important advantage over E, who can only listen 
to their communication. Up to now the partners just avoid repulsive steps by 
skipping some of the randomly generated input vectors. 

However, they can use their advantage in a better way. For this purpose the 



random inputs are replaced by queries [4J|, which A and B choose alternately ac 



cording to their own weight vectors. In fact, the partners ask each other questions 
and learn only the answers, on which they reach an agreement. 

Of course, the properties of the synchronization process now depend not only 
on the synaptic depth L of the Tree Parity Machines, but also on the chosen 
queries. Thus there is an additional parameter if, which fixes the absolute value 
\hi\ of the local fields in the neural network generating the current query. As 
the prediction error of a hidden unit is a function of both the overlap pi and 
the local field /ij, the partners modify the probability of repulsive steps -Pr(p) if 
they change H. By doing so A and B are able to adjust the difficulty of neural 



synchronization and learning 23 



In order to achieve a secure key exchange with queries the partners have to 
choose the parameter H in such a way that they synchronize quickly, while an 



attacker is not successful. Fortunately, this is possible for all known attacks [22 
Then one finds the same scaling laws again, which have been observed in the 
case of synchronization with random inputs. But because of the new parameter 
H one can reach a higher level of security for the neural key-exchange protocol 



without increasing the average synchronization time [22, [23 . 

However, queries make additional information available to the attacker, as E 
now knows the absolute value of the local fields in either A's or B's hidden units. 
In principle, this information might be used in specially adapted methods. But 
knowing H does not help E in the case of the geometric attack and its variants, 
so that using queries does not introduce obvious security risks. 

81 



82 



5. Key exchange with queries 



5.1 Queries 

In the neural key-exchange protocol as proposed in [Ti*] the input vectors Xj are 
generated randomly and independent of the current weight vectors w- of A's 
and B's Tree Parity Machines. Of course, by interacting with each other the 
partners are able to select which inputs they want to use for the movements of 
the weights. But they use their influence on the process of synchronization only 
for skipping steps with ^ in order to avoid repulsive effects. Although 
this algorithm for choosing the relevant inputs is sufficient to achieve a more 
or less secure key-exchange protocol, A and B could improve it by taking more 
information into account. 

In contrast, E uses the local field hf of the hidden units in her Tree Parity 
Machines in order to correct their output bits af if necessary. While this algo- 
rithm, which is part of all known attack methods except the simple attack, is not 
suitable for A and B, they could still use the information contained in hf^^ . Then 
the probability for af ^ af or af ^ cyf^^ is no longer given by the generalization 
error (13.201) . but by the prediction error (13.301) of the perceptron 30 . 



Consequently, the partners are able to distinguish input vectors Xj which are 
likely to cause either attractive or repulsive steps if they look at the local field. 
In fact, A's and B's situation is quite similar to E's in the case of the geometric 
attack. A low value of \hf^^\ indicates a high probability for ^ af . These 
input vectors may slow down the process of synchronization due to repulsive 
effects, so that it is reasonable to omit them. And a high value oi\h^ \ indicates 
that af = af^^ is very likely, which would help E. Therefore A and B could try 
to select only input vectors Xj with \hi\ H for the application of the learning 
rule, whereas the parameter H has to be chosen carefully in order to improve the 
security of the neural key-exchange protocol. 

While it is indeed possible to use the random sequence of input vectors and 
just skip unsuitable ones with \hi\ ^ H, this approach does not work well. If 
the range of acceptable local fields is small, then a lot of steps do not change the 
weights and (tgync) increases. But otherwise only small effects can be observed, 
because most input vectors with = are accepted as before. 



That is why the random inputs are replaced by queries [4J], so that the 
partners ask each other questions, which depend on their own weight vectors 
wf^^ ■ In odd (even) steps A (B) generates K input vectors Xj with hf ~ ±H 
{hf ^ ±-ff) using the algorithm presented in appendix O By doing so it is not 
necessary to skip steps in order to achieve the desired result: the absolute value 
of the local field hi is approximately given by the parameter H, while its sign ai 



is chosen randomly [23 



As shown in figure 15.11 using queries affects the probability that two corre- 
sponding hidden units disagree on their output ai. Compared to the case of 
a random input sequence, this event occurs more frequently for small overlap. 
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P 

Figure 5.1: Probability of disagreeing hidden units in the case of queries with 
different parameter H and Q = 1. The thick hne shows P{crf ^ af) for random 
inputs. 



but less for nearly synchronized neural networks. Hence queries are especially 
a problem for the attacker. As learning is slower than synchronization, pf^ is 
typically smaller than pf^. In this situation queries increase the probability of 
repulsive steps for the attacker, while the partners are able to regulate this effect 
by choosing if in a way that it does not interfere much with their process of 
synchronization. Consequently, using queries gives A and B more control over 
the difficulty of both synchronization and learning. 



5.2 Synchronization time 

Because queries change the relation between the overlap pf^ and the probability 
of repulsive steps P^^{pf^), using them affects the number of steps needed to 
reach full synchronization. But the neural key-exchange protocol is only useful 
in practice, if the synchronization time tgync is not too large. Otherwise, there 
is no advantage compared to classical algorithms based on number theory. This 
condition, of course, restricts the usable range of the new parameter H. 

As shown in 15.21 (tsync) diverges for if — > 0. In this limit the prediction error 
eP reaches 1/2 independent of the overlap, so that the effect of the repulsive steps 
inhibits synchronization. But as long as H is chosen large enough, it does not 
have much influence on the effort of generating a key p3|- 

In fact, A and B can switch the mechanism of synchronization by modifying 
H. This is clearly visible in figure 15.31 If the absolute value of the local fields 
is so large that (Ap) > for all p < 1, synchronization on average happens. 
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Figure 5.2: Synchronization time of two Tree Parity Machines with K = 3, 
N — 1000, and random walk learning rule, averaged over 10 000 simulations. 
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Figure 5.3: Average change of the overlap for synchronization with queries using 
K — 3, L — 5, N — 1000, and the random walk learning rule. Symbols denote 
results obtained in 10 000 simulations, while the line shows (Ap) for synchroniza- 
tion with random inputs. 
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which is similar to the normal key-exchange protocol using a random sequence 
of input vectors. But decreasing H below a certain value Hi creates a new fixed 
point of the dynamics at pf < 1. In this case synchronization is only possible 
by fiuctuations. As the gap with (Ap) < grows with decreasing H < H{, one 
observes a steep increase of the average synchronization time (tsync)- If A and B 
use the random walk learning rule together with K = 3, L = 5, and N = 1000, 
one finds H{ ^ 1.76 in this case. 

Additionally, figure shows a dependency on the synaptic depth L of (tgync). 



which is caused by two effects [23 



• The speed of synchronization is proportional to the step sizes (Apa) for 
attractive and (Ap^) for repulsive steps. As shown in section 13.1.21 these 
quantities decrease proportional to L~^. Therefore the average synchro- 
nization time increases proportional to the square of the synaptic depth as 
long as H > Hf. 

(tsync) OC . (5.1) 

This causes the vertical shift of the curves in figure 15. 2[ 

• If queries are used, the probabilities Pa for attractive and P^. for repulsive 
steps depend not only on the overlap pi, but also on quantity H/^/Ql ac- 
cording to (13.301) . In the case of the random walk learning rule the weights 
stay uniformly distributed, so that the length of the weight vectors grows 
proportional to L as shown in section [3. l.li That is why one has to increase 
H proportional to the synaptic depth, 

H = aL, (5.2) 

in order to achieve identical transition probabilities and consequently the 
same average synchronization time. This explains the horizontal shift of 
the curves in figure 15. 2[ 

Using both scaling laws (15.11) and (15.21) one can rescale (tsync) in order to obtain 
functions fiicn) which are nearly independent of the synaptic depth except for 
finite-size effects (isj ]: 

{tsync) =L^fL(^J^ ■ (5.3) 

Figure EiH shows these functions for different values of L. It is clearly visible 
that fL{(y) converges to a universal scaling function f{a) in the limit L — >■ oo: 

f{a) = hm hia) . (5.4) 

Additionally, the finite-size effects have a similar behavior in regard to L as the 
fluctuations of the overlap pj, which have been analyzed in section 13.4.21 That 
is why the distance Ifiict) — /(«)| shrinks proportional to L^^. Therefore the 
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Figure 5.4: Scaling behavior of the synchronization time. The thick curve denotes 
the universal function f{a) defined in (15.41) . It has been obtained by finite-size 
scaling, which is shown in the inset. 




Figure 5.5: Extrapolation of the inverse function to L ^ oo. Symbols denote 
the values extracted from figure 15.41 for different average synchronization times. 
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Figure 5.6: Synchronization time for neural cryptography with queries. These 
results have been obtained in 100 simulations with K = and L = 7. 



universal function /(a) can be determined by finite-size scaling, which is shown 
in figure 15.41 too. 

This function diverges for a < Oc- The critical value etc = H^/L can be 
estimated by extrapolating the inverse function f£^, which is shown in 15.51 By 
doing so one finds etc ~ 0.31 for K = 3 and = 1000, if A and B use the 
random walk learning rule 22]. Consequently, synchronization is only achievable 
for H > in the limit L — >■ oo. However, in the case of finite synaptic depth 
synchronization is even possible slightly below due to fluctuations [23^. 

Although the weights do not stay uniformly distributed in the case of Hebbian 
and anti-Hebbian learning, one observes qualitatively the same behavior of (tsync) 
as a function of the parameters H and L. This is clearly visible in figure 15. 6[ As 
the length of the weight vectors is changed by these learning rules, the critical 
local field = UcL for synchronization is different. In the case oi K = 3 and 
A^ = 1000, one finds Oc ~ 0.36 for Hebbian learning 23[ and etc ~ 0.25 for anti- 
Hebbian learning. But in the limit N ^ oo the behavior of both learning rules 
converges to that of the random walk learning rule 2^, which is also visible in 
figure 15. 6[ 



5.3 Security against known attacks 

Because of the cryptographic application of neural synchronization it is important 
that the key-exchange protocol using queries is not only efficient, but also secure 
against the attacks known up to now. Therefore it is necessary to determine how 
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Figure 5.7: Average change of the overlap for = 3, L = 5, = 1000, H = 1.77, 
and M = 100. Symbols denote results obtained in 200 simulations using the 
random walk learning rule. 

different absolute values of the local field influence the security of the system. 
Of course, the results impose further restrictions upon the usable range of the 
parameter H. 



5.3.1 Dynamics of the overlap 

Replacing random inputs with queries gives A and B an additional advantage 
over E. Now they can choose a suitable value of the new parameter H, which 
influences the probability of repulsive steps as shown in figure 15.11 (on page [HS]) . 
By doing so the partners are able to modify the dynamics of the synchronization 
process, not only for themselves, but also for an attacker. And because {Ap^^{p)) 
is greater than (Ap^^(p)), A and B can generate queries in such a way that a 
fixed point of the dynamics at pf < 1 only exists for E. Then the neural key- 
exchange protocol is secure in principle, because (t^nc) grows exponentially with 
increasing synaptic depth while (t^nc) 

Figure 15.71 shows that this is indeed possible. Here A and B have chosen 
H fa iff, so that they just synchronize on average. In contrast, E can reach the 
absorbing state at p = 1 only by fluctuations, as there is a fixed point of the 
dynamics at pf < 1 for both the geometric attack and the majority attack. In 
principle, this situation is similar to that observed in the case of random inputs. 
However, the gap between the fixed point and the absorbing state is larger, so 
that the success probability of both attacks is decreased. This is clearly visible 
by comparing figure 15.31 with figure 13.161 (on page HI]) and figure 14.71 (on page MSj ■ 
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Figure 5.8: Success probability Pe as a function of H. Symbols denote the results 
obtained in 1000 simulations using = 3, L = 10, = 1000, and the random 
walk learning rule, while the lines show fit results for model (15.51) . The number 
of attacking networks is M = 4096 for the genetic attack and M = 100 for the 
majority attack. 



5.3.2 Success probability 

In practice it is necessary to look at the success probability Pe of the known 
attacks in order to determine the level of security provided by neural cryptography 
with queries. 

As shown in figure 15.81 E is nearly always successful in the case of large H, 
because she is able to synchronize on average similar to A and B. But if H is 
small, the attacker can reach full synchronization only by fluctuations, so that 
Pe drops to zero. In fact, one can use a Fermi-Dirac distribution 

(5.5) 



l + exp{-(3{H - fi)) 



as a suitable fitting function in order to describe Pe as a function of H. This 



model is suitable for both the majority attack [23| and the genetic attack [22 
Of course, one can also use it to describe Pe{H) of the geometric attack, which 
is the special case M = 1 of the more advanced attacks. Comparing these curves 
in figure 15.81 reveals directly that the genetic attack is the best choice for the 
attacker in this case. 

Additionally, one observes a similar behavior for all three learning rules. This 
is clearly visible in figure 15.91 Only the fit parameters are different due to the 
changed length of the weight vectors. Hebbian learning increases Qi, so that an 
higher value of H is needed in order to achieve the same value of the success 
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Figure 5.9: Success probability Pe of the majority attack for = 3, L = 10, 
= 1000, and M = 100. Symbols denote results obtained in 10 000 simulations, 
while lines represents fits with model f l5.5l) . 



probability. In contrast, the anti-Hebbian learning rule decreases Qi, so that one 
observes a similar behavior with a lower value of H. Consequently, equation (15.50 
is a universal model, which describes the success probability Pe as a function of 
the absolute local field H for all known attacks. 

However, it is not sufficient to know the fit parameters fi and (3 for only one 
value of the synaptic depth. In order to estimate the security of the neural key- 
exchange protocol with queries, one has to look at the scaling behavior of these 
quantities in regard to L. 

Figure 15.101 shows that increasing the synaptic depth does not change the 
shape of Pe{H) much, so that the steepness (3 is nearly constant for L > 3. But 
there is a horizontal shift of the curves due to the growing length of the weight 
vectors. In fact, the position /i of the smooth step increases linearly with the 
synaptic depth L, 

/i = + 5 , (5.6) 
which is shown in figure [5?TT1 As before, the method chosen by E does not matter. 



because equation (15.61) is valid in all cases [22, 1231] . Only the parameters Ofg and 5 
depend on the learning rule and the attack. This is clearly visible in figure 15.121 
and in figure I5.13[ 

Combining (15. 5p and (15. 6p yields 

^ (5.7) 



1 + exp(/5 5) exp(/3 (ctg — a)L) 



for the success probability of any known attack. As long as A and B choose 
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Figure 5.10: Success probability of the geometric attack for = 3, = 1000, 
and the Hebbian learning rule. Symbols denote results obtained in 10 000 simu- 
lations, while lines show fits with model (15. 5p . 




Figure 5.11: Parameters /i and /? as a function of the synaptic depth L for the 
geometric attack. Symbols denote the results of fits using model (15. 5p . based on 
10 000 simulations with = 3 and = 1000. 
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Figure 5.12: Parameter /i and /3 as a function of L for the genetic attack with 
K = 3, N = 1000, and M = 4096. The symbols represent results from 1000 
simulations and the lines show a fit using the model given in (15.61) . 




Figure 5.13: Parameters /i and /? as a function of the synaptic depth L for the 
majority attack. Symbols denote the results of fits using model (15.51) . based on 
10 000 simulations with i^- = 3, = 1000, and M = 100. 
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a = H/L according to the condition a < as, Pe vanishes for L — >■ oo. In this 
case its asymptotic behavior is given by 

PE~e~'^'^e-^("=-")^, (5.8) 

which is consistent with the observation 

^ Q-yiL-Lo) ^5 9) 

found for neural cryptography with random inputs 1^. Comparing the coeffi- 
cients in both equations reveals 

y = P{as-a), (5.10) 
Lq = — 5/(«s — a). (5-11) 

Thus replacing random inputs with queries gives A and B direct influence on the 
scaling of Pe in regard to L, as they can change y by modifying a. Finally, the 
results indicate that there are two conditions for a fast and secure key-exchange 
protocol based on neural synchronization with queries: 

• As shown in section [512] the average synchronization time (tsync) diverges in 
the limit L — oo, if if is too small. Therefore A and B have to choose this 
parameter according to H > acL. 

• And if H is too large, the key-exchange becomes insecure, because Pe = 1 
is reached in the limit L ^ oo. So the partners have to fulfill the condition 
H < OgL for all known attacks. 

Fortunately, A and B can always choose a fixed a = H/ L according to 

etc < a < tts , (5-12) 

as there is no known attack with < etc. Then (tsync) grows proportional to 
L^, but Pe drops exponentially with increasing synaptic depth. Consequently, A 
and B can reach any desired level of security by just changing L |23 |. 



5.3.3 Optimal local field 

For practical aspects of security, however, it is important to look at the relation 
between the average synchronization time and the success probability, as a too 
complex key-exchange protocol is nearly as unusable as an insecure one. That 
is why A and B want to minimize Pe for a given value of (tsync) by choosing L 
and H appropriately. These optimum values can be determined by analyzing the 

function PB((tsync))- 

Figure 15.141 shows the result for the geometric attack. The optimum value of 
H lies on the envelope of all functions pE{{tsync))- This curve is approximately 
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Figure 5.14: Success probability of the geometric attack as a function of (tgync)- 
Symbols denote results obtained in 10 000 simulations using the Hebbian learning 
rule, K — 3, and N — 1000. The solid curve represents Pe in the case of random 
inputs and the dashed line marks H — 0.36 L. 
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Figure 5.15: Success probability of the majority attack as a function of (tgync)- 
Symbols denote results obtained in 10 000 simulations using the Hebbian learning 
rule, K = 3, M = 100, and N = 1000. The solid curve represents Pe in the case 
of random inputs and the dashed line marks H = 0.36 L. 



5.3 Security against known attacks 



95 




8000 



Figure 5.16: Success probability of the genetic attack as a function of (tgync)- Sym- 
bols denote results obtained in 1000 simulations using the random walk learning 
rule, K = M = 4096, and = 1000. The solid curve represents Pe in the 
case of random inputs and the dashed line marks H = 0.32 L. 



given by if = acL, as this choice maximizes as — a, while synchronization is still 
possible 23(1 . It is also clearly visible that queries improve the security of the 
neural key-exchange protocol greatly for a given average synchronization time. 

A similar result is obtained for the majority attack. Here figure 15.151 shows 
that the partners can even do better by using queries with H < acL as long as L 
is not too large. This effect is based on fluctuations which enable synchronization, 
but vanish in the limit L ^ oo. Thus the optimum value of H is still given by 
H acL if L ^ 1. Additionally, figure ISTTSl indicates that A and B can even 
employ the Hebbian learning rule for neural cryptography with queries, which 



led to an insecure key-exchange protocol in the case of random inputs [2l|, |23 



5.3.4 Genetic attack 

Compared to the other methods the genetic attack is in a certain way different. 
First, it is especially successful, if L is small. That is why A and B have to use 
Tree Parity Machines with large synaptic depth L regardless of the parameter 
H. Of course, this sets a lower limit for the effort of performing the neural 
key-exchange protocol as shown in figure 15.161 

Second, the genetic attack is a rather complicated algorithm with a lot of 
parameters. Of course, E tries to optimize them in order to adapt to special sit- 
uations. Here the number M of attacking networks is clearly the most important 
parameter, because it limits the number of mutation steps tg which can occur 
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Figure 5.17: Parameter /i and /? as a function of L for the genetic attack with 
K = 3, N = 1000, and the random walk learning rule. Symbols denote results of 
fitting simulation data with (15.51) and the lines were calculated using the model 
given in fl5.6p . 



between two selection steps: 



1 InM 



Thus E can test different variants of the internal representation (ai, . . . ax) for 
at most ts steps, before she has to select the fittest Tree Parity Machines. And 
more time results in better decisions eventually. Therefore one expects that E 
can improve Pe by increasing M similar to the effect observed for random inputs 
in section |4.1.2[ 

Figure [5.171 shows that this is indeed the case. While as stays constant, the 
offset S decreases with increasing M. As before, it is a logarithmic effect, 

5(M) = 5(1) -5slnM, (5.14) 

which is clearly visible in figure 15.181 Therefore E gains a certain horizontal shift 
5Eln2 of the smooth step function Pe{H) by doubling the effort used for the 
genetic attack ^]. Combining (14. 9 p and (15.70 yields 

(5.15) 



1 + exp{(3{S{l) - Se InM)) exp(/5(a, - a)L) 



for the success probability of this method. Then the asymptotic behavior for 
L 1 is given by 
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Figure 5.18: Offset 5 as a function of the number of attackers M, for the genetic 
attack with K = N = 1000, and the random walk learning rule. Symbols and 
the line were obtained by a fit with fl5.14p . 

as long as a < ctg. Similar to neural cryptography with random inputs E has to 
increase the number of attacking networks exponentially, 



in order to maintain a constant success probability Pg, if A and B change the 
synaptic depth L. But, due to limited computer power, this is often not feasible. 

However, the attacker could also try to improve Pe by changing the other 
parameters U and V of the genetic attack. Instead of the default values ?7 = 10, 

= 20 E could use t/ = 30, = 50j_ which maximize Pe without greatly 
changing the complexity of the attack [22]. But this optimal choice, which is 
clearly visible in figure 15.191 does not help much as shown in figure 15.171 Only (3 
is lower for the optimized attack, while as remains nearly the same. Therefore the 
attacker gains little, as the scaling relation fl5.17p is not affected. Consequently, 
the neural key-exchange protocol with queries is even secure against an optimized 
variant of the genetic attack in the limit L oo. 

5.3.5 Comparison of the attacks 

Of course, the opponent E always employs the best method, which is available to 
her in regard to computing power and other resources. Therefore it is necessary 
to compare all known attack methods in order to estimate the level of security 
achieved by a certain set of parameters. 
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Figure 5.19: Success probability of the genetic attack in the case of K = 3, L = 7, 
N = 1000, M = 4096, H = 2.28, and random walk learning rule. These results 
were obtained by averaging over 100 simulations. 




Figure 5.20: Success probability of different attacks as a function of the synaptic 
depth L. Symbols denote results obtained in 1000 simulations using the random 
walk learning rule, K = 3, H = 0.32L, and N = 1000, while the lines show fit 
results for model (15.91) . Here E has used M = 4096 networks for the genetic 
attack and M = 100 for the majority attack. 
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The result for neural cryptography with queries is shown in figure 15.201 It is 
qualitatively similar to that observed in section 14.1.41 in the case of synchroniza- 
tion with random inputs. As the majority attack has the minimum value of as, 
it is usually the best method for the attacker. Only if A and B use Tree Parity 
Machines with small synaptic depth, the genetic attack is better. 

However, comparing figure [^T^ with figure (on pageEH]) reveals, that there 
are quite large quantitative differences, as replacing random inputs with queries 
greatly improves the security of the neural key-exchange protocol. Extrapolation 
of (Ell) shows that Pe ~ 10"^ is achieved for K = 3, L = 18, = 1000, 
H = 5.76, and random walk learning rule. This is much easier to realize than 
L = 57, which would be necessary in order to reach the same level of security in 
the case of random inputs. 

5.4 Possible security risks 

Although using queries improves the security of the neural key-exchange proto- 
col against known attacks, there is a risk that a clever attacker may improve the 
success probability Pe by using additional information revealed through the algo- 
rithm generating the input vectors. Two obvious approaches are analyzed here. 
First, E could use her knowledge about the absolute local field H to improve the 
geometric correction of the internal representation {af , . . . ,a^). Second, each 
input vector Xj is somewhat correlated to the corresponding weight vector Wj of 
the generating network. This information could be used for a new attack method. 

5.4.1 Known local field 

If the partners use queries, the absolute value of the local field in either A's or 
B's hidden units is given by H. And E knows the local fields hf in her own Tree 
Parity Machine. In this situation the probability of af ^ af is no longer given 
by (I3.20p or (I3.30p . if it is A's turn to generate the input vectors. Instead, one 
finds 



Although one might assume that this probability is minimal for \hf\ ^ H, it is 
not the case. In contrast, P{crf ^ af) reaches its maximum at \hf\ = and is a 
strictly decreasing function of \hf\ as before. 

This is clearly visible in figure I5.21[ As there is no qualitative difference 
compared to synchronization with random inputs, it is not possible to improve 
the geometric attack by using H as additional information. Instead, it is still 
optimal for E to flip the output of that hidden unit, which has the minimum 
absolute value of the local fleld. 




(5.18) 
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Figure 5.21: Prediction error as a function of the local field /if for Qf 
Qf = 1, and p = 0.5. 



5.4.2 Information about weight vectors 

While H cannot be used directly in the geometric attack, queries give E additional 
information about the weight vectors in A's and B's Tree Parity Machines. But 
fortunately the absolute local field H used for synchronization with queries is 
lower than the average value 



observed for random inputs. Hence the overlap 



Wi 



hi 



Pi,i 



(5.19) 



(5.20) 



between input vector and weight vector is very small and converges to zero in the 
limit N ^ oo, although H > 0. Consequently, Xj and Wj are nearly perpendicular 



to each other, so that the information revealed by queries is minimized [231 . 

In fact, for a given value of H the number of weight vectors, which are con- 
sistent with a given query, is still exponentially large. As an example, there are 
2.8 X 10^^^ possible weight vectors for a query with H = 10, L = 10, and N = 100 
Consequently, E cannot benefit from the information contained in the input 



23 



vectors generated by A and B. 



Chapter 6 

Conclusions and outlook 



In this thesis the synchronization of neural networks by learning from each other 
has been analyzed and discussed. At a glance this effect looks like an extension of 
online learning to a series of examples generated by a time dependent rule. How- 
ever, it turns out that neural synchronization is a more complicated dynamical 
process, so that new phenomena occur. 

This is especially true for Tree Parity Machines, whereas synchronization is 
driven by stochastic attractive and repulsive forces. Because this process does 
not have a self- averaging order parameter, one has to take the whole distribution 
of the weights into account instead of using just the average value of the order 
parameter to determine the dynamics of the system. This can be done using 
direct simulations of the variables Wij for finite iV or a iterative calculation of 
their probability distribution in the limit N ^ oo. 

While one can use different learning rules both for bidirectional synchroniza- 
tion and unidirectional learning, they show similar behavior and converge to the 
random walk learning rule in the limit N oo. So the deviations caused by 
Hebbian and anti-Hebbian learning are, in fact, finite-size effects, which become 
only relevant for L ^ 0{^/N). 

In contrast, numerical simulations as well as iterative calculations show a 
phenomenon, which is significant even in very large systems: In the case of Tree 
Parity Machines learning by listening is much slower than mutual synchroniza- 
tion. This effect is caused by different possibilities of interaction. Two neural 
networks, which can influence each other, are able to omit steps, if they caused a 
repulsive effect. This is an advantage compared to a third Tree Parity Machine, 
which is trained using the examples produced by the other two and cannot select 
the most suitable input vectors for learning. Consequently, if interaction is only 
possible in one direction, the frequency of repulsive steps is higher than in the 
case of bidirectional communication. 

Although the overlap p is not a self-averaging quantity, one can describe neural 
synchronization as a random walk in p-space. Here the average step sizes (Apa) 
and (Apr) are the same for synchronization and learning. But the transition 
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probabilities -Pa(p) and -Pr(p) depend on the type of interaction. As a result one 
can observe qualitative differences regarding the dynamics of the overlap. In the 
case of K = 3 and bidirectional interaction the average change of the overlap 
(Ap) is strictly positive, so that synchronization by mutual learning happens on 
average. But for K > 3 ot unidirectional interaction the higher probability of 
repulsive steps causes a fixed point of the dynamics at pf < 1. Then reaching the 
absorbing state at p = 1 is only possible by means of fluctuations. 

While both mechanisms lead to full synchronization eventually, one observes 
two different distributions of the synchronization time depending on the function 
(Ap(p)). In the case of synchronization on average, it is a Gumbel distribution, 
because one has to wait until the last weight has synchronized. Analytical calcu- 
lations for systems without repulsive steps yield the result (tsync) oc L"^ In A^. And 
a few repulsive steps do not change this scaling behavior, but simply increase the 
constant of proportionality. 

In contrast, if synchronization is only possible by means of fluctuations, there 
is a constant probability per step to get over the gap with (Ap(p)) < between 
the fixed point and the absorbing state. Of course, this yields an exponential 
distribution of the synchronization time. However, the fluctuations of the overlap 
in the steady state decrease proportional to L~^. As they are essential for reaching 
p = 1 in this case, the synchronization time grows exponentially with increasing 
synaptic depth of the Tree Parity Machines. 

Without this difference a secure key-exchange protocol based on the synchro- 
nization of neural networks would be impossible. But as A's and B's Tree Parity 
Machines indeed synchronize faster than E's neural networks, the partners can 
use the synchronized weight vectors as a secret session key. Of course, there is 
a small probability Pe that E is successful before A and B have finished their 
key exchange due to the stochastic nature of the synchronization process. But 
fortunately Pe drops exponentially with increasing L for nearly all combinations 
of learning rules and attack methods. Thus A and B can achieve any level of 
security by just increasing the synaptic depth L. 

Additionally, there are other observations which indicate that bidirectional 
interaction is an advantage for A and B compared to a passive attacker E. For a 
time series generated by two Tree Parity Machines the version space of compatible 
initial conditions is larger, if both are already synchronized at the beginning, than 
if the neural networks start unsynchronized. So it is harder for an attacker to 
imitate B because of the interaction between the partners. And, of course, the 
attack methods are unable to extract all the information which is necessary to 
achieve full synchronization. This effect is mainly caused by the fact, that A and 
B can choose the most useful input vectors from the random sequence, but E 
does not have this ability 4^ . 

Thus the partners can improve the security of neural cryptography further, if 
they use a more advanced algorithm to select the input vectors. This approach 
eventually leads to synchronization with queries. In this variant of the key- 
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exchange protocol A and B ask each other questions, which depend on the weights 
in their own networks. In doing so they are able to choose the absolute value of 
the local field in the Tree Parity Machine generating the current query. Of course, 
this affects both synchronization and attacks. However, E is at a disadvantage 
compared to A and B, because she needs a higher absolute value of the local field 
than the partners in order to synchronize on average. Therefore it is possible to 
adjust the new parameter H in such a way, that A and B synchronize fast, but 
E is not successful regardless of the attack method. 

However, the algorithm generating the input vectors does not matter for the 
opponent. E has no infiuence on it and the relative efficiency of the attacks stays 
the same, whether a random input sequence or queries are used. In both cases 
the majority attack is the best method as long as the synaptic depth is large. 
Only if L is small, the genetic attack is better. Of course, both advanced attacks 
are always more successful than the geometric attack. And the simple attack is 
only useful for :§> 3. 

In any case, the effort of the partners grows only polynomially, while the suc- 
cess probability of an attack drops exponentially, if the synaptic depth increases. 
Similar scaling laws can be found, if one looks at other cryptographic systems. 
Onl y th e parameter is different. While the security of conventional cryptography 



25l . l26l | depends on the length of the key, the synaptic depth of the Tree Parity 



Machines plays the same role in the case of neural cryptography [22 



Brute-force attacks are not very successful, either. Here the number of keys 
grows exponentially with the system size A^, while the synchronization time is 
only proportional to logA^. Thus A and B can use large systems without much 
additional effort in order to prevent successful guessing of the generated key. 

Consequently, the neural key-exchange protocol is secure against all attacks 
known up to now. However, there is always the risk that one might find a clever 
attack, which breaks the security of neural cryptography completely, because it 
is hardly ever possible to prove the security of such an algorithm 25 . 

However, the neural key-exchange protocol is different from conventional cryp- 
tographic algorithms in one aspect. Here effects in a physical system, namely at- 
tractive and repulsive stochastic forces, are used instead of those found in number 
theory. In fact, the trap door function is realized by a dynamics, which is differ- 
ent for partners and attackers based on their possibilities of interaction with the 
other participants. Of course, neural networks are not the only type of systems 
with these properties. Any other system showing similar effects can be used for 
such a cryptographic application, too. 



Interesting systems include chaotic maps and coupled lasers 46l-l49| . In both 



cases one observes that synchronization is achieved faster for bidirectional than 
for unidirectional coupling. As this underlying effect is very similar, one can use 
nearly the same cryptographic protocol by just substituting the neural networks. 
Of course, this apphes to the attack methods, too. For example, the algorithms of 
the majority attack and the genetic attack are so general, that they are also useful 
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methods for attacks on key-exchange protocols using chaotic maps. In contrast, 
the geometric correction algorithm is rather specific for neural networks, so that 
it has to be replaced by appropriate methods. 

Consequently, the neural key-exchange protocol is only the first element of 
a class of new cryptographic algorithms. Of course, all these proposals have to 
be analyzed in regard to efficiency and security. For that purpose the methods 
used in this thesis can probably act as a guidance. Especially synchronization by 
fiuctuations and synchronization on average are rather abstract concepts, so that 
one should be able to observe them in a lot of systems. 

Another interesting direction is the implementation of the neural key-exchange 
protocol. Computer scientists are already working on a hardware realization of 
interacting Tree Parity Machines for cryptographic purposes 50l-l55l|. They have 
especially found out that neural synchronization only needs very basic mathe- 
matical operations and, therefore, is very fast compared to algorithms based on 
number theory. Consequently, one can use neural cryptography in small embed- 



ded systems, which are unable to use RSA or other established methods 25|, [26 



Here it does not matter that the neural key-exchange protocol only reaches a 
moderate level of security as long as one requires a small synchronization time. 
But integrated circuits can achieve a very high frequency of key updates, which 



compensates this disadvantage [50H52 



Finally, these approaches indicate that further development of neural cryp- 
tography is indeed possible. As mentioned before, there are, in fact, two distinct 
directions: First, one can extend the neural key-exchange protocol in order to 
improve the efficiency, security and usefulness for specific cryptographic applica- 
tions, e. g. embedded systems. Second, one can replace the neural networks by 
other physical systems, e. g. chaotic lasers, which have similar properties to those 
identified in this thesis as essential for security. 



Appendix A 
Notation 



A sender 
B receiver 
E attacker 



K number of hidden units in a Tree Parity Machine 

L synaptic depth of the neural networks 

N number of neurons per hidden unit 

M (maximum) number of attacking networks 

H absolute set value of the local field 

R threshold for the reset of the input vectors 

U minimal fitness 

V length of the output history 

Wj weight vector of the i-th hidden unit 

Xj input vector of the i-th hidden unit 

Wij j-th element of Wj 

Xij j-th element of Xj 

(Tj output of the i-th hidden unit 

T total output of a Tree Parity Machine 

hi local field of the i-th hidden unit 

Pi overlap of the i-th hidden unit 

€i generalization error 
prediction error 

Pa probability of attractive steps 

Pr probability of repulsive steps 

Apa step size of an attractive step 

Apr step size of a repulsive step 
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A. Notation 



(Ap) average change of the overlap 

Pi fixed point of the dynamics 

(Tf width of the p-distribution at the fixed point 



I 


mutual information 


s 


entropy of a weight distribution 


So 


maximal entropy of a single neural network 




number of possible weight configurations 


^key 


number of distinct keys 




size of the version space 


T 


synchronization time for two random walks 


Tn 


synchronization time for pairs of random walks 


^sync 


synchronization time for two Tree Parity Machines 


Pe 


success probability of an attacker 


y 


sensitivity of Pe in regard to L 




minimal value of L for the exponential decay of Pe 


a 


rescaled local field Hj L 




minimum a for synchronization 




maximum a for security 




sensitivity of Pe in regard to H 




offset of Pe{H) 


7 


Euler-Mascheroni constant (7 a; 0.577) 



Auxiliary functions for the learning rules 
• control signal 

a Hebbian learning rule 
f{a,T^,T^) = Q{aT^)Q{T\^) { -a anti-Hebbian learning rule 

1 random walk learning rule 



boundary condition 

g{w) = 



sgn(w) L for \w\ > L 
w otherwise 



Appendix B 
Iterative calculation 



This appendix presents the algorithm, which is used to calculate the time evolu- 
tion of the weight distribution iteratively in the limit N ^ oo Compared 



to direct simulations weights are replaced by (2L+ 1) x (2L + 1) variables pj^ j,, 
which describe the probability that one finds a weight with w^j = a and wfj = b. 
Consequently, one has to adapt both the calculation of the output bits and the 
update of the weight configuration. 



B.l Local field and output bits 



According to its definition (12.31) the local field hi of a hidden unit is proportional 
to the sum over independent random variables WijXij. Therefore the central 
limit theorem applies and the probability to find certain values of hf and hf in 
a time step is given by 

-il/2)(hth^)C-\hf,hB)T 

In this equation the covariance matrix C describes the correlations between A's 
and B's Tree Parity Machines in terms of the well-known order parameters Q and 
R, which are functions of the weight distribution according to fl2.1ip . fl2.12p . and 

Qt Rt^ 

In order to generate local fields hf and /if, which have the correct joint proba- 
bility distribution (IB.ip . the following algorithm is used. A pseudo-random num- 
ber generator produces two independent uniformly distributed random numbers 
zi,Z2 G [0, 1[. Then the local fields are given by [56 



(B.2) 



hi 



-2Qf ln(;zi)cos(27r22) 



h 



B 



2Qfhi{zi) pcos{2TTZ2) + a/1 — sin(27r2;2 



(B.3) 
(B.4) 
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B. Iterative calculation 



Afterwards one can calculate the outputs cXi and r in the same way as in the case 
of direct simulations. As the local fields are known, it is possible to implement 
the geometric correction, too. Therefore this method is also suitable to study 
synchronization by unidirectional learning, e. g. for a geometric attacker. Addi- 
tionally, the algorithm can be extended to three and more interacting Tree Parity 



Machines 19 



B.2 Equations of motion 

The equations of motion are generally independent of the learning rule, because 
the behavior of Hebbian and anti-Hebbian learning converges to that of the ran- 
dom walk learning rule in the limit oo. Consequently, the weights in both 
participating Tree Parity Machines stay uniformly distributed, only the correla- 
tions between and wf change. 

Attractive steps 

In an attractive step corresponding weights move in the same direction. Thus 
the distribution of the weights changes according to the following equations of 



motion for —L < a,b < L: 

pi% = 1{pUi,-l + pUi,-l+i) , (B.r; 

Pit. = ^(pU-i+p1-i,.-i) , (B.S: 

P-L,f, = liP-LMi+P^L+iMi) ^ (B-9 

PltL = I (P1»1,L-1 + Pl-1,L + Pl,L-l + PIl) , (B.IO: 

p'-^-L = 1{P-l+i,-l+i+p'-l+i,-l+P-l,-l+i+P-l,-l) , (B.li; 

Pf,-L = 0, (B.I2: 

Pi,L = 0. (B.I3: 



Repulsive steps 

In a repulsive step only the weights in one hidden unit move, either in A's or in 
B's Tree Parity Machine. However, the active hidden unit is selected randomly 



B.2 Equations of motion 
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by the output bits, so that both possibiUties occur with equal probabihty. Thus 
one can combine them in one set of equations ior —L < a,b < L: 



Pa,b — 


(Pa+1,6 Pa-l,b Pa,b+1 Pa,b-l) ' 




(B.14) 


Pa,L — 


\ {pi+i,L + pi-i,L + pi,L + pi,L-i) , 




(B.15) 




1 (Pa+1 -L + Pa-l-L + Pa-L+l + PI, 


-l) , 


(B.16) 




\ {PL,b + + PL,b+i + ' 




(B.17) 


P-L,b — 


\ {P-L+l,b+P-L,b+P-L,b+l +P-L,b- 


-l) ' 


(B.18) 




- i2p\ r +p\ T 7 + Pt T 1 ) , 




(B.19) 

V / 








(B.20) 




\ {^p\-l+Pl-i,-l+Pl,-l^i) , 




(B.21) 




\ {'^P-L,L+P-L+l,L+P-L,L-l) ■ 




(B.22) 



Inverse attractive steps 

Inverse attractive steps are only possible if A and B do not interact at all, but 
use common input vectors. In such a step corresponding weights move in the 
opposite direction. Thus the distribution of the weights changes according to the 
following equations of motion for —L < a,b < L: 



Pa,b — 


2 (^'a+1,6-1 +Pa-l,6+l) i 


(B.23) 


Pa,L — 


2 (^'a+l,L +^'L+l,L-l) 1 


(B.24) 


Pa-L — 


2 kp\-\-L + Pa-l,-L+l) 1 


(B.25) 


PL,b — 


2 ' 


(B.26) 


P-L,b — 

Pl,l — 

P%,-L = 


2 {P-L,b-1 + P-L+l,b-l) 1 

0, 
0, 


(B.27) 

(B.28) 
(B.29) 


Pt-L = 


2 ip^L-i-L+i ) 


(B.30) 


P%L = 


2 (^'-L+1,L-1 +P-L+1,L +P-L,L-1 +P-L,l) ■ 


(B.31) 
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Appendix C 
Generation of queries 



This appendix describes the algorithm [23j] used to generate a query Xj, which 
results in a previously chosen local field hi. Finding an exact solution is similar to 
the knapsack problem 57| and can be very difficult depending on the parameters. 
Hence this is not useful for simulations, as one has to generate a huge number of 
input vectors Xj in this case. Instead a fast algorithm is employed, which gives 
an approximate solution. 

As both inputs Xi^ and weights Wi^ are discrete, there are only 2L + 1 possible 
results for the product Wij Xij. Therefore a set of input vectors consisting of all 
permutations, which do not change hi, can be described by counting the number 



Ci^i of products with Wij Xij 



I. Then the local field is given by 



(C.l^ 



1=1 



which depends on both inputs and weights. But the sum rii^i = Ci^i + Ci^^i is 
equal to the number of weights with \wij\ = \l\ and thus independent of Xj. 
Consequently, one can write hi as a function of only L variables. 



^/(2Q,i - rii^i) 



(C.2) 



1=1 



as the values of rij ^ are defined by the current weight vector Wj. 

In the simulations the following algorithm 23|] is used to generate the queries. 
First the output (Xj of the hidden unit is chosen randomly, so that the set value 
of the local field is given by hi = aiH. Then the values of q^l, q^l-i, . . . , q^i 
are calculated successively. For that purpose one of the following equations is 
selected randomly with equal probability, either 



rii. 



Cil 



1 1 f ^ 

\ j=i+i 



n. 



(C.3) 
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C. Generation of queries 



or 




) 



(C.4) 



in order to reduce the influence of rounding errors. Additionally, one has to take 
the condition < q,; < ; into account. If equation ( ]C.3[) or equation (lC.4p 
yield a result outside this range, c,^/ is reset to the nearest boundary value. 

Afterwards the input vector Wj is generated. Those Xij associated with zero 
weights Wij = do not influence the local field, so that their value is just chosen 
randomly. But the other input bits divided into L groups according to 

the absolute value I = \wij\ of their corresponding weight. Then input bits 
are selected randomly in each group and set to Xij = sgn(wjj), while the other 
nk,i — Ck^i inputs are set to Xij = — sgn(wj ,,). 

Simulations show that queries generated by this algorithm result in local fields 
hi which match the set value aiH on average [i^. Additionally, only very small 
deviations are observed, which are caused by the restriction of inputs and weights 
to discrete values. So this algorithm is indeed suitable for the generation of 
queries. 
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